Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cole Kripke 10-second and 30-second variant: Does actigraph.sleepr facilitate aggregation based on max per minute? #9

Open
vincentvanhees opened this issue Feb 23, 2022 · 3 comments

Comments

@vincentvanhees
Copy link

vincentvanhees commented Feb 23, 2022

Thanks for sharing your code.

I see you use Cole Kripke 60 seconds as default, but also implemented the Cole Kripke 10- and 30-second variant in function apply_cole_kripke.

Do I understand correctly that if I want to use those variants, I would have to first pre-process my data to first find the maximum 10- or 30 second epoch per minute and then use those as input to the apply_cole_kripke function? At least that is what Cole Kripke did in their paper. I have been trying to find out whether you already wrote such a pre-processing function, but couldn't find it.

If you could clarify that would be much appreciated.

@dipetkov
Copy link
Owner

dipetkov commented Mar 12, 2022

Hello
As best as I can remember, I implemented the 10sec and 30sec versions from the equations in the Cole, Kripke et al. article, without any testing. That's why these versions are not exposed by the apply_cole_kripke function.

And, yes, I haven't implemented a pre-processing function to find the maximum 10s or 30s epoch per minute.

@dipetkov
Copy link
Owner

dipetkov commented Mar 12, 2022

Now I'm wondering how to implement this preprocessing step.....

Can you verify that I understand correctly what "the maximum 10sec/30sec nonoverlapping epoch of activity per minute" means?

library("lubridate")
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library("tidyverse")

# Create a minute of data for illustration
input <-
  tibble::tribble(
    ~timestamp, ~count,
    "2012-06-27 10:54:00", 377L,
    "2012-06-27 10:54:10", 465L,
    "2012-06-27 10:54:20", 505L,
    "2012-06-27 10:54:30", 73L,
    "2012-06-27 10:54:40", 45L,
    "2012-06-27 10:54:50", 0L
  ) %>%
  mutate(
    across(timestamp, as.Date)
  )
input
#> # A tibble: 6 × 2
#>   timestamp  count
#>   <date>     <int>
#> 1 2012-06-27   377
#> 2 2012-06-27   465
#> 3 2012-06-27   505
#> 4 2012-06-27    73
#> 5 2012-06-27    45
#> 6 2012-06-27     0


# Find the maximum 10-second nonoverlapping epoch per minute

input %>%
  # The data is already at 10sec frequency,
  # so just find the 10sec window with the largest count
  group_by(
    timestamp = floor_date(timestamp, unit = "minute")
  ) %>%
  slice_max(
    count
  )
#> # A tibble: 1 × 2
#> # Groups:   timestamp [1]
#>   timestamp           count
#>   <dttm>              <int>
#> 1 2012-06-27 00:00:00   505


# Find the maximum 30-second nonoverlapping epoch per minute
input %>%
  # The data is at 10sec frequency,
  # so first we aggregate by summing the counts within each 30sec window
  group_by(
    timestamp = floor_date(timestamp, unit = "30 seconds")
  ) %>%
  summarise(
    across(count, sum)
  ) %>%
  # Then as above, we find the 30sec window with the largest count
  group_by(
    timestamp = floor_date(timestamp, unit = "minute")
  ) %>%
  slice_max(
    count
  )
#> # A tibble: 1 × 2
#> # Groups:   timestamp [1]
#>   timestamp           count
#>   <dttm>              <int>
#> 1 2012-06-27 00:00:00  1465

Created on 2022-03-12 by the reprex package (v2.0.1)

@vincentvanhees
Copy link
Author

Sorry for slow reply, I wasn't sure how to respond as I also do not fully understand how they did it. In the non-overlapping variant I think they worked with count values per 10 seconds and then looked for the most active 10 seconds per 60 seconds and used that as final indicator of movement per minute. However, what is unclear to me is whether they stick to the unit of counts per 10 seconds or convert it to counts per minute. Maybe the fact that it is not mentioned means that they do not make the conversion.

Best possible solution may be to try both and see which one provides the best estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants