-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voting methods for feature ranking in efs #112
Merged
Merged
Changes from 74 commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
a5d1b38
add stability selection article
bblodfon 4cc3815
add Rcpp code for approval voting feature ranking method
bblodfon 21ae7d7
add citation
bblodfon ccffa4b
extra check during init()
bblodfon 108ddc2
update doc + use the Rcpp interface for approval voting
bblodfon 589df2e
add templates for params in ArchiveBatchFSelect + updocs
bblodfon e520c77
use testthat expectations (not checkmate ones!)
bblodfon 0ecc618
add test for newly implemented voting methods
bblodfon 2622c96
update test for av
bblodfon 97f21c4
fix note
bblodfon f84f91c
refactor AV_rcpp, add SAV_rcpp
bblodfon 3614d93
add norm_score, and SAV R function
bblodfon 0a1eb49
add sav, improve doc
bblodfon fc5d24d
fix efs test
bblodfon 6df3bbd
update and improve test for AV
bblodfon fc86503
add sav test
bblodfon 0d9eccf
Merge branch 'main' into voting_methods
bblodfon 87d68d4
add borda score
bblodfon fa05f09
update tests
bblodfon 6a89966
add seq and revseq PAV Rcpp methods
bblodfon 5c09975
add R functions for the PAV methods
bblodfon 103bf45
comment printing
bblodfon ff17d11
add tests for PAV methods
bblodfon b6f4b5e
add PAV methods to efs
bblodfon 3a248cf
refactor: do not use C++ RNGs
bblodfon 92ce0df
fix startsWith
bblodfon 283003e
updocs
bblodfon 567f456
fix data.table note
bblodfon e55ae24
add committee_size parameter, refactor borda score
bblodfon 9a37e60
add large data test for seq pav
bblodfon 58ab928
refactor C++ code, add optimized PAV
bblodfon 61c0907
remove revseq-PAV method, use optimized seqPAV
bblodfon 8654a38
update tests
bblodfon 47e3dcf
remove suboptimal seqPAV function
bblodfon b369c6e
shuffle candidates outside Rcpp functions (same tie-breaking)
bblodfon 6b7fb03
optimize Phragmen a bit => do not randomly select the candidate with …
bblodfon 60065f9
add phragmen's rule in efs
bblodfon 8ffa44f
correct borda score + use phragmens rule
bblodfon 852ff35
add tests for Phragmen's rule
bblodfon 5623812
correct weighted Phragmen's rule
bblodfon 7e3be3e
add specific test for phragmen's rule
bblodfon 25387c4
Merge branch 'main' into voting_methods
bblodfon 1eef6c6
run document()
bblodfon f2ccbda
show data.table result after using ':='
bblodfon bea5e39
add n_resamples field + nicer obj print
bblodfon 2d21fc7
cover edge case (eg lasso resulted in no features getting selected)
bblodfon ad9fd2e
Merge branch 'main' into voting_methods
bblodfon 7f3ab3b
updocs
bblodfon 4137404
small styling fix
bblodfon d151303
add Stabl ref
bblodfon 83529b6
more descriptive name
bblodfon 49bb097
add embedded ensemble feature selection
bblodfon 6f3923f
remove print()
bblodfon 123624e
add TOCHECK comment on benchmark design
bblodfon 0581cdc
use internal valid task
be-marc 14acd73
simplify
be-marc 81b475d
...
be-marc 79747ad
store_models = FALSE
be-marc 331f231
...
be-marc 081acc8
separate the use of inner_measure and measure used in the test sets
bblodfon efc0155
updocs
bblodfon 0e2f93f
update tests
bblodfon 3bca203
Merge branch 'main' into voting_methods
bblodfon d457221
refactor: expect_vector => expect_numeric
bblodfon 9cb56b1
fix partial arg match
bblodfon cc36179
fix example
bblodfon 816376a
use fastVoteR for feature ranking
bblodfon 3dae249
pass named list to callback parameter
be-marc fd5afbc
skip test if fastVoteR is not available
bblodfon c937024
refactor: better handling of inner measure
bblodfon 8e506c8
add tests for embedded_ensemble_fselect()
bblodfon 3bd1772
update NEWs
bblodfon 9e05dca
add active_measure field
bblodfon 832bd7f
remove Remotes as fastVoteR is now on CRAN :)
bblodfon 8c0d73f
refine doc
bblodfon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say that differently? Scoring on a train set sounds wrong. Is this the outer train set which is split by the inner resampling? We score the inner resample result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its the outer train set. The
inner_resampling
generates N train/test splits. Theinner_measure
is used to optimize/tune on the train set and you get the best subset and final model + score on that train set. We use these final models to also score the corresponding test splits (the inner resampling result you ask), with themeasure
. In embeddedefs
we only do the second (noinner_measure
is needed/used).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can change the wording to specifically mention the train/test splits of the inner resampling (I also mentionthat earlier in the doc), what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the final model with the best subset and corresponding performance estimated on the inner resampling. There is no scoring on the outer training set but scoring on the inner resampling result. This is very similar to nested resampling. Maybe stick to the words used bellow figure 4.5
https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html#sec-nested-resampling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sorry Marc, it's as you say, when I was writing the above comment, I meant outer resampling (what we call
init_resampling
) as the one that generates the train/test splits. And yes, pretty much we are doing nested CV, with outer resampling the N times holdout split. I will update the doc