recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

Volpym · 2024-07-12T15:57:43Z

Describe the bug
Running recalibratePlpRefit doesn't return an object of class runPlp that is recalibrated on the new data object, as stated in the documentation but an object of type data.frame. Am I missing something?

Set up (please run in R "sessionInfo()" and copy the output here):

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8      
 [6] LC_MESSAGES=en_US.UTF-8       LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] readr_2.1.5                  PatientLevelPrediction_5.0.5 FeatureExtraction_3.5.2      Andromeda_0.6.6              dplyr_1.1.4                 
[6] DatabaseConnector_6.3.2      memuse_4.2-3                

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1     remotes_2.4.2        purrr_1.0.2          splines_4.2.1        rJava_1.0-11         lattice_0.20-45      vctrs_0.6.5         
 [8] generics_0.1.3       utf8_1.2.4           blob_1.2.4           survival_3.3-1       rlang_1.1.4          pkgbuild_1.4.4       pillar_1.9.0        
[15] glue_1.7.0           withr_3.0.0          DBI_1.2.3            ParallelLogger_3.3.0 Cyclops_3.4.1        bit64_4.0.5          dbplyr_2.5.0        
[22] plyr_1.8.9           lifecycle_1.0.4      memoise_2.0.1        tzdb_0.4.0           callr_3.7.6          fastmap_1.2.0        ps_1.7.7            
[29] parallel_4.2.1       curl_5.2.1           fansi_1.0.6          urltools_1.7.3       triebeard_0.4.1      Rcpp_1.0.12          backports_1.5.0     
[36] checkmate_2.3.1      cachem_1.1.0         desc_1.4.2           vroom_1.6.5          jsonlite_1.8.8       bit_4.0.5            hms_1.1.3           
[43] processx_3.8.4       SqlRender_1.18.0     rprojroot_2.0.4      grid_4.2.1           cli_3.6.3            tools_4.2.1          magrittr_2.0.3      
[50] tibble_3.2.1         RSQLite_2.3.7        crayon_1.5.3         pkgconfig_2.0.3      Matrix_1.5-0         pROC_1.18.5          rstudioapi_0.16.0   
[57] R6_2.5.1             compiler_4.2.1

To Reproduce

cdmDatabaseSchema <- 'demo_cdm'
cdmDatabaseName <- 'postgres'
dbms <- "postgresql"
user <- 'postgres'
pw <- 'postgres'
server <- '<ip>/postgres'
port <- '5438'

connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = dbms,
                                                                server = server,
                                                                user = user,
                                                                password = pw,
                                                                port = port)
databaseDetails <- PatientLevelPrediction::createDatabaseDetails(  connectionDetails,
                                           cdmDatabaseSchema,
                                           cdmDatabaseName,
                                           tempEmulationSchema = cdmDatabaseSchema,
                                           cohortDatabaseSchema = cdmDatabaseSchema,
                                           cohortTable = "sinusitiscohort",
                                           outcomeDatabaseSchema = cdmDatabaseSchema,
                                           outcomeTable = "sinusitiscohort",
                                           outcomeIds = 5,
                                           cdmVersion = 5.3,
                                           cohortId = 3)

covariateSettings <- FeatureExtraction::createCovariateSettings(
  useDemographicsGender = T, 
  useDemographicsAgeGroup = T, 
  useConditionGroupEraLongTerm = T
), 

restrictPlpDataSettings <- PatientLevelPrediction::createRestrictPlpDataSettings(
  studyStartDate = "",
  studyEndDate = "",
  firstExposureOnly = F,
  washoutPeriod = 0,
  sampleSize = NULL
)

plpData <- PatientLevelPrediction::getPlpData(databaseDetails, covariateSettings, restrictPlpDataSettings)
outcomeId <- 5
pop1 <- PatientLevelPrediction::createStudyPopulationSettings(riskWindowStart = 1, 
                                      riskWindowEnd = 365,
                                      requireTimeAtRisk = T, 
                                      minTimeAtRisk = 364, 
                                      includeAllOutcomes = T)

newPopulation <- PatientLevelPrediction::createStudyPopulation(plpData, outcomeId,pop1, NULL)
model<-loadPlpModel('<path_to_rds>')
recalibratedModel <- recalibratePlpRefit(model, newPopulation, plpData)

Context
I have divided the Eunomia dataset into three chunks. I trained a PLP model on the first chunk and now I am recalibrating it using the data from the second chunk. Next, I plan to recalibrate this model again using the third chunk of data. Finally, I will compare this final recalibrated model with the PLP model trained on the entire Eunomia dataset.

The text was updated successfully, but these errors were encountered:

jreps · 2024-07-12T20:12:39Z

The documentation is definitely wrong. Thanks for reporting this. It appears to return the prediction for the new population with the original model and recalibrated model. It also adds attributes for how the model needed to be adjusted, but these seem hidden. It does refit the model, so this could be edited to return a runPlp, but I need to see whether it is used inside other functions before editing.

Volpym · 2024-07-14T08:01:37Z

Could you provide some information on how to generate the runPlp object myself?

egillax · 2024-11-08T07:50:33Z

Hi @Volpym,

I'm now looking at the recalibration code a bit in relation to a study I'm doing. First, if you want to return the model from the recalibratePlpRefit something like the following should work:

Modified recalibratePlpRefit

recalibratePlpRefit <- function(
    plpModel,
    newPopulation,
    newData, 
    returnModel = FALSE) {
  checkNotNull(plpModel)
  checkNotNull(newPopulation)
  checkNotNull(newData)
  checkIsClass(plpModel, "plpModel")
  checkIsClass(newData, "plpData")
  checkBoolean(returnModel)

  # get selected covariates
  includeCovariateIds <- plpModel$covariateImportance %>%
    dplyr::filter(.data$covariateValue != 0) %>%
    dplyr::select("covariateId") %>%
    dplyr::pull()

  # check which covariates are included in new data
  containedIds <- newData$covariateData$covariateRef %>% dplyr::collect()
  noShrinkage <- intersect(includeCovariateIds, containedIds$covariateId)

  # add intercept
  noShrinkage <- append(noShrinkage, 0, 0)

  setLassoRefit <- setLassoLogisticRegression(
    includeCovariateIds = includeCovariateIds,
    noShrinkage = noShrinkage,
    maxIterations = 10000 # increasing this due to test code often not converging
  )

  newData$labels <- newPopulation

  newData$folds <- data.frame(
    rowId = newData$labels$rowId,
    index = sample(2, length(newData$labels$rowId), replace = TRUE)
  )

  # add dummy settings to fit model
  attr(newData, "metaData")$outcomeId <- attr(newPopulation, "metaData")$outcomeId
  attr(newData, "metaData")$targetId <- attr(newPopulation, "metaData")$targetId
  attr(newData, "metaData")$restrictPlpDataSettings <- attr(newPopulation, "metaData")$restrictPlpDataSettings
  attr(newData, "metaData")$covariateSettings <- newData$metaData$covariateSettings
  attr(newData, "metaData")$populationSettings <- attr(newPopulation, "metaData")$populationSettings
  attr(newData$covariateData, "metaData")$featureEngineeringSettings <- PatientLevelPrediction::createFeatureEngineeringSettings()
  attr(newData$covariateData, "metaData")$preprocessSettings <- PatientLevelPrediction::createPreprocessSettings()
  attr(newData, "metaData")$splitSettings <- PatientLevelPrediction::createDefaultSplitSetting()
  attr(newData, "metaData")$sampleSettings <- PatientLevelPrediction::createSampleSettings()

  newModel <- tryCatch(
    {
      fitPlp(
        trainData = newData,
        modelSettings = setLassoRefit,
        analysisId = "recalibrationRefit",
        analysisPath = NULL
      )
    },
    error = function(e) {
      ParallelLogger::logInfo(e)
      return(NULL)
    }
  )
  if (is.null(newModel)) {
    ParallelLogger::logInfo("Recalibration fit failed")
    return(NULL)
  }

  newModel$prediction$evaluationType <- "recalibrationRefit"

  oldPred <- predictPlp(
    plpModel = plpModel,
    plpData = newData,
    population = newPopulation,
    timepoint = 0
  )

  oldPred$evaluationType <- "validation"

  prediction <- rbind(
    oldPred,
    newModel$prediction[, colnames(oldPred)]
  )

  if (!is.null(newModel$covariateImportance)) {
    adjust <- newModel$covariateImportance %>%
      dplyr::filter(.data$covariateValue != 0) %>%
      dplyr::select(
        "covariateId",
        "covariateValue"
      )
  } else {
    adjust <- c()
  }

  newIntercept <- newModel$model$coefficients[names(newModel$model$coefficients) == "(Intercept)"]

  attr(prediction, "metaData")$recalibratePlpRefit <- list(adjust = adjust, newIntercept = newIntercept)
  
  if (returnModel) {
    return(model = newModel)
  } else {
    return(prediction)
  }
}

But this is not a runPlp object, but rather a plpModel object. I don't think it ever should have returned a runPlp object, it was probably some copy paster error in the docs.

But I've been thinking about our three types of recalibration, recalibrationInTheLarge, weakRecalibration and the recalibratePlpRefit . I think it would be good to get the recalibrated model out and ready to be used.

Could you tell me more about your use case and for example why you are looking at the recalibratePlpRefit rather than the other two methods ?

Egill

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

Volpym commented Jul 12, 2024 •

edited

Loading

jreps commented Jul 12, 2024

Volpym commented Jul 14, 2024

egillax commented Nov 8, 2024

recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

recalibratePlpRefit returns a dataframe instead of an object of class runPlp #470

Comments

Volpym commented Jul 12, 2024 • edited Loading

jreps commented Jul 12, 2024

Volpym commented Jul 14, 2024

egillax commented Nov 8, 2024

Volpym commented Jul 12, 2024 •

edited

Loading