Automate testing of thrive lines #27

eatyourpeas · 2020-06-13T12:21:21Z

It would be useful to automate a process which creates serial data points that mimic the growth of a child for testing purposes

statist7 · 2020-06-13T16:22:41Z

Here is some R code to do that, using the weight correlation matrix I sent you a few days ago. I don't know how to interpret R in Python, but I'm sure it's easy to do!

The code produces a matrix with 10 rows (each corresponding to one subject) and 13 columns, which are ages 0 to 12 months. The weight z-scores need back-transforming to weight, using whichever sex you choose.

If you want random rather than fixed ages that will involve interpolating between the whole-month correlations, which is more involved.

  library(MASS)
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]
  diag(R) <- 1
  nvars <- dim(R)[[1]]
  nsubs <- 10 # no of subjects
  X <- mvrnorm(nsubs, rep(0, nvars), R)
  # each row is one random subject's weight z-scores from 0 to 12 months

^{Created on 2020-06-13 by the reprex package (v0.3.0)}

eatyourpeas · 2020-06-14T11:51:30Z

That is hilariously concise @statist7 you are a remarkable guy. I have created a function here called:
def create_fictional_child(sex: str, measurement_type: str, requested_sds: float, number_of_measurements: int, starting_decimal_age: float, measurement_interval_value: float, measurement_interval_type: str, gestation_weeks = 0, gestation_days = 0, drift: bool = False, drift_sds_range: float = 0.0):
that allows the user to prescribe number of data points, interval between and starting age. I have also put in an option to introduce a degree of drift, largely to test the weight correlation function I put in using your matrices. So far I am only using the months but using the weeks should only be a question of swapping over. I am very grateful for this snippet - I will see if I can incorporate your more elegant solution.
As for weight correlations I did struggle with the bilinear interpolation but it does not generate a number. Not sure how accurate? Perhaps if you have time to look at mine you could give me pointers on what to change? See the correlate_weight function at line 114 onwards.

statist7 · 2020-06-15T15:49:42Z

I slightly struggled with your code, so instead I've expanded the R code to handle a set of random ages, or the ages can alternatively be provided directly. The required correlation matrix is linearly interpolated using the akima package, so the code needs to be run in R rather than translated to Python. Note we can easily switch to spline interpolation, though it should make very little difference.

  library(MASS)
  library(akima)

  t <- 0:12 # ages in correlation matrix (months)
  nt <- 5 # number of measurements to simulate
  nsubs <- 1 # one subject
  t0 <- sort(runif(nt, min(t), max(t))) # generate ordered random ages
  R <- as.matrix(read.csv('~/Dropbox/CIGS/RCPCH weight correlation matrix by month.csv'))[, -1]

  xyz <- cbind(expand.grid(x = t, y = t), z = c(R)) # grid of ages and correlation matrix
  corr <- with(xyz, interp(x, y, z, t0, t0))$z # interpolate correlations for measurement ages
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr) # simulated growth curve as z-scores at ages t0

  # to check the code, set ages to reference ages with 1000 subjects and
  # check that simulated correlation matrix matches reference correlation matrix

  t0 <- t
  nt <- length(t0)
  nsubs <- 1000
  corr <- with(xyz, interp(x, y, z, t0, t0))$z
  diag(corr) <- 1
  X <- mvrnorm(nsubs, rep(0, nt), corr)
  round(corr - cor(X), 2) # correlation errors only ~0.01
#>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,] 0.00  0.00  0.01  0.00  0.00  0.02 0.02 0.04 0.04  0.04  0.04  0.05  0.04
#>  [2,] 0.00  0.00 -0.01 -0.01 -0.01 -0.01 0.00 0.01 0.02  0.02  0.02  0.03  0.02
#>  [3,] 0.01 -0.01  0.00  0.00  0.00  0.00 0.00 0.02 0.02  0.01  0.02  0.03  0.02
#>  [4,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.01  0.00  0.01  0.02  0.01
#>  [5,] 0.00 -0.01  0.00  0.00  0.00  0.00 0.00 0.01 0.00  0.00  0.00  0.01  0.00
#>  [6,] 0.02 -0.01  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [7,] 0.02  0.00  0.00  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#>  [8,] 0.04  0.01  0.02  0.01  0.01  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.01
#>  [9,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.00  0.00  0.01  0.01  0.01
#> [10,] 0.04  0.02  0.01  0.00  0.00  0.00 0.00 0.00 0.00  0.00  0.00  0.01  0.00
#> [11,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.00 0.01  0.00  0.00  0.01  0.00
#> [12,] 0.05  0.03  0.03  0.02  0.01  0.01 0.01 0.01 0.01  0.01  0.01  0.00  0.01
#> [13,] 0.04  0.02  0.02  0.01  0.00  0.00 0.00 0.01 0.01  0.00  0.00  0.01  0.00

^{Created on 2020-06-15 by the reprex package (v0.3.0)}

eatyourpeas · 2020-06-16T07:47:21Z

This is very elegant. There is an akima module with in the SciPy package. I will try and convert. Might need to reach out to you separately to make it work.

statist7 · 2020-06-16T08:47:40Z

That's good news.

I've found this SO query about mvrnorm that includes a Python in R solution.
https://stackoverflow.com/questions/31666270/python-analog-of-mvrnorms-empirical-setting-when-generating-a-multivariate-dist

But ignore the empirical = True.

eatyourpeas · 2020-11-22T11:53:21Z

Not sure how this got logged as completed. The title is also misleading. This is fundamentally about testing thrive lines, so I have changed the title. It is still on the wish list to implement once MVP is complete. I am leaving open but moving to roadmap-future for the moment.

dc2007git · 2024-02-05T13:18:17Z

@eatyourpeas was this actioned?

eatyourpeas added feature-request New feature or request mvp required for MVP launch nice-to-have A feature which would be nice to have but is not essential labels Jun 13, 2020

eatyourpeas self-assigned this Jun 13, 2020

eatyourpeas changed the title ~~Automate creation of prescribed dummy data~~ Automate creation of serial growth data for testing thrive lines Nov 22, 2020

eatyourpeas added not-for-mvp Not required for MVP launch and removed mvp required for MVP launch labels Nov 22, 2020

pacharanero changed the title ~~Automate creation of serial growth data for testing thrive lines~~ Automate testing of thrive lines Apr 2, 2022

anchit-chandran pushed a commit that referenced this issue Apr 3, 2024

Update chart-information-health-staff.md (#27)

24cdb6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate testing of thrive lines #27

Automate testing of thrive lines #27

eatyourpeas commented Jun 13, 2020

statist7 commented Jun 13, 2020

eatyourpeas commented Jun 14, 2020 •

edited

Loading

statist7 commented Jun 15, 2020

eatyourpeas commented Jun 16, 2020

statist7 commented Jun 16, 2020

eatyourpeas commented Nov 22, 2020 •

edited

Loading

dc2007git commented Feb 5, 2024

Automate testing of thrive lines #27

Automate testing of thrive lines #27

Comments

eatyourpeas commented Jun 13, 2020

statist7 commented Jun 13, 2020

eatyourpeas commented Jun 14, 2020 • edited Loading

statist7 commented Jun 15, 2020

eatyourpeas commented Jun 16, 2020

statist7 commented Jun 16, 2020

eatyourpeas commented Nov 22, 2020 • edited Loading

dc2007git commented Feb 5, 2024

eatyourpeas commented Jun 14, 2020 •

edited

Loading

eatyourpeas commented Nov 22, 2020 •

edited

Loading