cohorts converted to data.table/tidytable.... #1

coforfe · 2021-11-15T10:29:47Z

Hello Peer,

Thanks for your package. I find it very useful and convenient.

Since I am using a large customer base, the process to get the monthly or the daily cohorts take a little while, so I converted your code to data.table (and a little bit of tidytable) and I see already an improvement. Even for the small dataset you include online_cohorts in the package I see an improvement of 2x.

If you are interested, I can send to you the code to extend your package with them.

Thanks,
Carlos.

The text was updated successfully, but these errors were encountered:

PeerChristensen · 2021-11-15T10:58:45Z

Hi Carlos

I'm very pleased that you find it useful! Converting to data.table is a great idea, and i don't remember why I didn't do this in the first place. Please send you code. I'd be more than happy to try it and perhaps make a new release.

Best
Peer

coforfe · 2021-11-15T11:52:11Z

Hi Peer,

Yes, I have been playing with cohort_table_month() and cohort_table_day(), but without converting them to a function yet.

Here is the code for each case:

cohort_table_month():

library(data.table)
library(tidytable)
library(cohorts)
library(zoo)

online_cohorts %>%
  as.data.table() %>%
  setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
  .[ , month  := as.yearmon(date), by =  .(id_var)] %>%
  .[ , cohort := min(month), by =.(id_var)] %>%
  .[ , users := .N, by = .(cohort, month) ] %>%
  .[ , .(cohort, month, users)] %>%
  unique() %>%
  pivot_wider.( names_from = month, values_from = users) %>%
  .[ , cohort := 1:uniqueN(cohort)]

cohort_table_day():

library(data.table)
library(tidytable)
library(cohorts)
library(zoo)

 online_cohorts %>%
    as.data.table() %>%
    setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
    .[ , cohort := min(date), by = .(id_var)] %>%
    .[ , users := .N, by = .(cohort, date) ] %>%
    .[ , .(cohort, date, users)] %>%
    unique() %>%
    pivot_wider.( names_from = date, values_from = users) %>%
    .[ , cohort := 1:uniqueN(cohort)]

Thanks!
Carlos.

coforfe · 2021-11-15T11:59:59Z

Hello,

Just in case, I leave here the code for the benchmarking, where it is compared:

the data.table approach
the equivalent with dplyr
your function

library(cohorts)
library(data.table)
library(magrittr)
library(zoo)
library(tidytable)
library(dplyr)
library(microbenchmark)
library(tidyr)


microbenchmark(

#------------------------
# cohort_table_month()
#------------------------
data.table = online_cohorts %>%
  as.data.table() %>%
  setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
  .[ , month  := as.yearmon(date), by = .(id_var)] %>%
  .[ , cohort := min(month), by = .(id_var)] %>%
  .[ , users := .N, by = .(cohort, month) ] %>%
  .[ , .(cohort, month, users)] %>%
  unique() %>%
  pivot_wider.( names_from = month, values_from = users) %>%
  .[ , cohort := 1:uniqueN(cohort)]
, 

dplyr =
online_cohorts %>% 
  rename( id_var = CustomerID) %>%
  rename( date = InvoiceDate) %>%
  group_by( id_var ) %>% 
  mutate(month = zoo::as.yearmon( date )) %>%
  mutate(cohort = min(month)) %>% 
  group_by(cohort, month) %>% 
  summarise(users = dplyr::n()) %>% 
  pivot_wider(names_from = month,  values_from = users) %>%
  ungroup() %>%
  mutate(cohort = 1:dplyr::n_distinct(cohort)) %>% 
  tibble::as_tibble()

,

funcion = online_cohorts %>%
  cohort_table_month(CustomerID, InvoiceDate)

, times = 10
)


#---------------------
# cohort_table_day()
#---------------------
microbenchmark(
  data.table = online_cohorts %>%
    as.data.table() %>%
    setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
    .[ , cohort := min(date), by = .(id_var)] %>%
    .[ , users := .N, by = .(cohort, date) ] %>%
    .[ , .(cohort, date, users)] %>%
    unique() %>%
    pivot_wider.( names_from = date, values_from = users) %>%
    .[ , cohort := 1:uniqueN(cohort)]
  , 
  
  dplyr =
    online_cohorts %>% 
    rename( id_var = CustomerID) %>%
    rename( date = InvoiceDate) %>%
    group_by( id_var ) %>% 
    mutate(cohort = min(date)) %>% 
    group_by(cohort, date) %>% 
    summarise(users = n()) %>% 
    pivot_wider(names_from = date,  values_from = users) %>%
    ungroup() %>%
    mutate(cohort = 1:dplyr::n_distinct(cohort)) %>% 
    tibble::as_tibble()
  
  ,
  
  funcion = online_cohorts %>%
    cohort_table_day(CustomerID, InvoiceDate)
  
  , times = 10
)

Thanks,
Carlos.

coforfe · 2021-11-17T13:48:41Z

Hi,

Attached you can find the previous transformations converted to functions and using pure data.table syntax (no need for tidytable anymore). I also tuned a little bit cohort_table_pct() function.

#----------------------  MONTH  --------------------
cohort_table_month_fast <- function(dt , customer, date) {
  #-- Customer should be an id.
  #-- date: should be a date class.
  dt_out <- dt %>%
  as.data.table() %>%
    .[ , month  := as.yearmon(get(date)), by =  .(get(customer))] %>%
    .[ , cohort := min(month), by =.(get(customer))] %>%
    .[ , users := .N, by = .(cohort, month) ] %>%
    .[ , .(cohort, month, users)] %>%
    unique() %>%
    dcast( cohort ~ month, value.var = "users" ) %>%
    .[ , cohort := 1:uniqueN(cohort) ] %>%
    as.data.table()
  
  return(dt_out)
}



#----------------------   PCT --------------------
cohort_table_pct_fast <- function( dt, decimals = 1) {
  
  diagonal <- dt %>%
    .[ , -"cohort", with = FALSE] %>%
    as.matrix() %>%
    diag()
  
  res_pct <- round(dt*100/diagonal, decimals) %>%
    .[ , cohort := 1:nrow(dt)] %>%
    as.data.table()
  
  return(res_pct)
}




#-------------------   DAY  -----------------
cohort_table_day_fast <- function(dt , customer, date) {
  #-- Customer should be an id.
  #-- date: should be a date class.
  dt_out <- dt %>%
    as.data.table() %>%
    .[ , datetr := as.IDate(get(date))] %>%
    .[ , cohort := min(datetr), by =.(get(customer))] %>%
    .[ , users := .N, by = .(cohort, datetr) ] %>%
    .[ , .(cohort, datetr, users)] %>%
    unique() %>%
    dcast( cohort ~ datetr, value.var = "users") %>%
    .[ , cohort := 1:uniqueN(cohort)] %>%
    as.data.table()
  
  return(dt_out)
}

Hope that it helps.

Thanks!
Carlos.

PeerChristensen · 2021-11-18T09:51:20Z

Hi Carlos

Thank you so much for your great work.
I'd love to use your improvements in a coming release and make sure to give you credit.

Best regards
Peer

coforfe · 2021-11-18T10:05:09Z

Thanks Peer!.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cohorts converted to data.table/tidytable.... #1

cohorts converted to data.table/tidytable.... #1

coforfe commented Nov 15, 2021

PeerChristensen commented Nov 15, 2021

coforfe commented Nov 15, 2021 •

edited

Loading

coforfe commented Nov 15, 2021 •

edited

Loading

coforfe commented Nov 17, 2021

PeerChristensen commented Nov 18, 2021

coforfe commented Nov 18, 2021

cohorts converted to data.table/tidytable.... #1

cohorts converted to data.table/tidytable.... #1

Comments

coforfe commented Nov 15, 2021

PeerChristensen commented Nov 15, 2021

coforfe commented Nov 15, 2021 • edited Loading

coforfe commented Nov 15, 2021 • edited Loading

coforfe commented Nov 17, 2021

PeerChristensen commented Nov 18, 2021

coforfe commented Nov 18, 2021

coforfe commented Nov 15, 2021 •

edited

Loading

coforfe commented Nov 15, 2021 •

edited

Loading