-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cohorts converted to data.table/tidytable.... #1
Comments
Hi Carlos I'm very pleased that you find it useful! Converting to data.table is a great idea, and i don't remember why I didn't do this in the first place. Please send you code. I'd be more than happy to try it and perhaps make a new release. Best |
Hi Peer, Yes, I have been playing with Here is the code for each case:
library(data.table)
library(tidytable)
library(cohorts)
library(zoo)
online_cohorts %>%
as.data.table() %>%
setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
.[ , month := as.yearmon(date), by = .(id_var)] %>%
.[ , cohort := min(month), by =.(id_var)] %>%
.[ , users := .N, by = .(cohort, month) ] %>%
.[ , .(cohort, month, users)] %>%
unique() %>%
pivot_wider.( names_from = month, values_from = users) %>%
.[ , cohort := 1:uniqueN(cohort)]
library(data.table)
library(tidytable)
library(cohorts)
library(zoo)
online_cohorts %>%
as.data.table() %>%
setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
.[ , cohort := min(date), by = .(id_var)] %>%
.[ , users := .N, by = .(cohort, date) ] %>%
.[ , .(cohort, date, users)] %>%
unique() %>%
pivot_wider.( names_from = date, values_from = users) %>%
.[ , cohort := 1:uniqueN(cohort)] Thanks! |
Hello, Just in case, I leave here the code for the benchmarking, where it is compared:
library(cohorts)
library(data.table)
library(magrittr)
library(zoo)
library(tidytable)
library(dplyr)
library(microbenchmark)
library(tidyr)
microbenchmark(
#------------------------
# cohort_table_month()
#------------------------
data.table = online_cohorts %>%
as.data.table() %>%
setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
.[ , month := as.yearmon(date), by = .(id_var)] %>%
.[ , cohort := min(month), by = .(id_var)] %>%
.[ , users := .N, by = .(cohort, month) ] %>%
.[ , .(cohort, month, users)] %>%
unique() %>%
pivot_wider.( names_from = month, values_from = users) %>%
.[ , cohort := 1:uniqueN(cohort)]
,
dplyr =
online_cohorts %>%
rename( id_var = CustomerID) %>%
rename( date = InvoiceDate) %>%
group_by( id_var ) %>%
mutate(month = zoo::as.yearmon( date )) %>%
mutate(cohort = min(month)) %>%
group_by(cohort, month) %>%
summarise(users = dplyr::n()) %>%
pivot_wider(names_from = month, values_from = users) %>%
ungroup() %>%
mutate(cohort = 1:dplyr::n_distinct(cohort)) %>%
tibble::as_tibble()
,
funcion = online_cohorts %>%
cohort_table_month(CustomerID, InvoiceDate)
, times = 10
)
#---------------------
# cohort_table_day()
#---------------------
microbenchmark(
data.table = online_cohorts %>%
as.data.table() %>%
setnames( old = c('CustomerID', 'InvoiceDate'), new = c('id_var', 'date')) %>%
.[ , cohort := min(date), by = .(id_var)] %>%
.[ , users := .N, by = .(cohort, date) ] %>%
.[ , .(cohort, date, users)] %>%
unique() %>%
pivot_wider.( names_from = date, values_from = users) %>%
.[ , cohort := 1:uniqueN(cohort)]
,
dplyr =
online_cohorts %>%
rename( id_var = CustomerID) %>%
rename( date = InvoiceDate) %>%
group_by( id_var ) %>%
mutate(cohort = min(date)) %>%
group_by(cohort, date) %>%
summarise(users = n()) %>%
pivot_wider(names_from = date, values_from = users) %>%
ungroup() %>%
mutate(cohort = 1:dplyr::n_distinct(cohort)) %>%
tibble::as_tibble()
,
funcion = online_cohorts %>%
cohort_table_day(CustomerID, InvoiceDate)
, times = 10
)
Thanks, |
Hi, Attached you can find the previous transformations converted to functions and using pure #---------------------- MONTH --------------------
cohort_table_month_fast <- function(dt , customer, date) {
#-- Customer should be an id.
#-- date: should be a date class.
dt_out <- dt %>%
as.data.table() %>%
.[ , month := as.yearmon(get(date)), by = .(get(customer))] %>%
.[ , cohort := min(month), by =.(get(customer))] %>%
.[ , users := .N, by = .(cohort, month) ] %>%
.[ , .(cohort, month, users)] %>%
unique() %>%
dcast( cohort ~ month, value.var = "users" ) %>%
.[ , cohort := 1:uniqueN(cohort) ] %>%
as.data.table()
return(dt_out)
}
#---------------------- PCT --------------------
cohort_table_pct_fast <- function( dt, decimals = 1) {
diagonal <- dt %>%
.[ , -"cohort", with = FALSE] %>%
as.matrix() %>%
diag()
res_pct <- round(dt*100/diagonal, decimals) %>%
.[ , cohort := 1:nrow(dt)] %>%
as.data.table()
return(res_pct)
}
#------------------- DAY -----------------
cohort_table_day_fast <- function(dt , customer, date) {
#-- Customer should be an id.
#-- date: should be a date class.
dt_out <- dt %>%
as.data.table() %>%
.[ , datetr := as.IDate(get(date))] %>%
.[ , cohort := min(datetr), by =.(get(customer))] %>%
.[ , users := .N, by = .(cohort, datetr) ] %>%
.[ , .(cohort, datetr, users)] %>%
unique() %>%
dcast( cohort ~ datetr, value.var = "users") %>%
.[ , cohort := 1:uniqueN(cohort)] %>%
as.data.table()
return(dt_out)
}
Hope that it helps. Thanks! |
Hi Carlos Thank you so much for your great work. Best regards |
Thanks Peer!. |
Hello Peer,
Thanks for your package. I find it very useful and convenient.
Since I am using a large customer base, the process to get the monthly or the daily cohorts take a little while, so I converted your code to
data.table
(and a little bit oftidytable
) and I see already an improvement. Even for the small dataset you includeonline_cohorts
in the package I see an improvement of 2x.If you are interested, I can send to you the code to extend your package with them.
Thanks,
Carlos.
The text was updated successfully, but these errors were encountered: