Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate testing of thrive lines #27

Open
eatyourpeas opened this issue Jun 13, 2020 · 7 comments
Open

Automate testing of thrive lines #27

eatyourpeas opened this issue Jun 13, 2020 · 7 comments
Assignees
Labels
feature-request New feature or request nice-to-have A feature which would be nice to have but is not essential not-for-mvp Not required for MVP launch

Comments

@eatyourpeas
Copy link
Member

It would be useful to automate a process which creates serial data points that mimic the growth of a child for testing purposes

@eatyourpeas eatyourpeas added feature-request New feature or request mvp required for MVP launch nice-to-have A feature which would be nice to have but is not essential labels Jun 13, 2020
@eatyourpeas eatyourpeas self-assigned this Jun 13, 2020
@statist7
Copy link
Contributor

Here is some R code to do that, using the weight correlation matrix I sent you a few days ago. I don't know how to interpret R in Python, but I'm sure it's easy to do!

The code produces a matrix with 10 rows (each corresponding to one subject) and 13 columns, which are ages 0 to 12 months. The weight z-scores need back-transforming to weight, using whichever sex you choose.

If you want random rather than fixed ages that will involve interpolating between the whole-month correlations, which is more involved.

  library(MASS)
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]
  diag(R) <- 1
  nvars <- dim(R)[[1]]
  nsubs <- 10 # no of subjects
  X <- mvrnorm(nsubs, rep(0, nvars), R)
  # each row is one random subject's weight z-scores from 0 to 12 months

Created on 2020-06-13 by the reprex package (v0.3.0)

@eatyourpeas
Copy link
Member Author

eatyourpeas commented Jun 14, 2020

That is hilariously concise @statist7 you are a remarkable guy. I have created a function here called:
def create_fictional_child(sex: str, measurement_type: str, requested_sds: float, number_of_measurements: int, starting_decimal_age: float, measurement_interval_value: float, measurement_interval_type: str, gestation_weeks = 0, gestation_days = 0, drift: bool = False, drift_sds_range: float = 0.0):
that allows the user to prescribe number of data points, interval between and starting age. I have also put in an option to introduce a degree of drift, largely to test the weight correlation function I put in using your matrices. So far I am only using the months but using the weeks should only be a question of swapping over. I am very grateful for this snippet - I will see if I can incorporate your more elegant solution.
As for weight correlations I did struggle with the bilinear interpolation but it does not generate a number. Not sure how accurate? Perhaps if you have time to look at mine you could give me pointers on what to change? See the correlate_weight function at line 114 onwards.

@statist7
Copy link
Contributor

I slightly struggled with your code, so instead I've expanded the R code to handle a set of random ages, or the ages can alternatively be provided directly. The required correlation matrix is linearly interpolated using the akima package, so the code needs to be run in R rather than translated to Python. Note we can easily switch to spline interpolation, though it should make very little difference.

  library(MASS)
  library(akima)

  t <- 0:12 # ages in correlation matrix (months)
  nt <- 5 # number of measurements to simulate
  nsubs <- 1 # one subject
  t0 <- sort(runif(nt, min(t), max(t))) # generate ordered random ages
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]

  xyz <- cbind(expand.grid(x = t, y = t), z = c(R)) # grid of ages and correlation matrix
  corr <- with(xyz, interp(x, y, z, t0, t0))$z # interpolate correlations for measurement ages
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr) # simulated growth curve as z-scores at ages t0

  # to check the code, set ages to reference ages with 1000 subjects and
  # check that simulated correlation matrix matches reference correlation matrix

  t0 <- t
  nt <- length(t0)
  nsubs <- 1000
  corr <- with(xyz, interp(x, y, z, t0, t0))$z
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr)
  round(corr - cor(X), 2) # correlation errors only ~0.01
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,] 0.00  0.00  0.01  0.00  0.00  0.02 0.02 0.04 0.04  0.04  0.04  0.05  0.04
#>  [2,] 0.00  0.00 -0.01 -0.01 -0.01 -0.01 0.00 0.01 0.02  0.02  0.02  0.03  0.02
#>  [3,] 0.01 -0.01  0.00  0.00  0.00  0.00 0.00 0.02 0.02  0.01  0.02  0.03  0.02
#>  [4,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.01  0.00  0.01  0.02  0.01
#>  [5,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.00  0.00  0.00  0.01  0.00
#>  [6,] 0.02 -0.01  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [7,] 0.02  0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [8,] 0.04  0.01  0.02  0.01  0.01  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.01
#>  [9,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.00  0.00  0.01  0.01  0.01
#> [10,] 0.04  0.02  0.01  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#> [11,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.01  0.00  0.00  0.01  0.00
#> [12,] 0.05  0.03  0.03  0.02  0.01  0.01 0.01 0.01 0.01  0.01  0.01  0.00  0.01
#> [13,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.01 0.01  0.00  0.00  0.01  0.00

Created on 2020-06-15 by the reprex package (v0.3.0)

@eatyourpeas
Copy link
Member Author

This is very elegant. There is an akima module with in the SciPy package. I will try and convert. Might need to reach out to you separately to make it work.

@statist7
Copy link
Contributor

That's good news.

I've found this SO query about mvrnorm that includes a Python in R solution.
https://stackoverflow.com/questions/31666270/python-analog-of-mvrnorms-empirical-setting-when-generating-a-multivariate-dist

But ignore the empirical = True.

@eatyourpeas
Copy link
Member Author

eatyourpeas commented Nov 22, 2020

Not sure how this got logged as completed. The title is also misleading. This is fundamentally about testing thrive lines, so I have changed the title. It is still on the wish list to implement once MVP is complete. I am leaving open but moving to roadmap-future for the moment.

@eatyourpeas eatyourpeas changed the title Automate creation of prescribed dummy data Automate creation of serial growth data for testing thrive lines Nov 22, 2020
@eatyourpeas eatyourpeas added not-for-mvp Not required for MVP launch and removed mvp required for MVP launch labels Nov 22, 2020
@pacharanero pacharanero changed the title Automate creation of serial growth data for testing thrive lines Automate testing of thrive lines Apr 2, 2022
@dc2007git
Copy link
Contributor

@eatyourpeas was this actioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request nice-to-have A feature which would be nice to have but is not essential not-for-mvp Not required for MVP launch
Projects
None yet
Development

No branches or pull requests

3 participants