This is an automatic generator of ficticious databases originally meant to be used as support tools in Biostatistic classes.
This script is meant to generate big, flexible databases that can be used to practice the concepts learned on Biostatistic (and Statistic) classes. It comprises a plethora of variables of all sorts, with underlying relations between them, designed to allow exploration. All values are originally randomized, according to rules described on !!!ADD DOCUMENTATION!!!, many of them inspired by real data publicly available. Some easter eggs may be present, as the data will be forced to match some of my frends', in case their name is found in the table.
This project was inspired by my experiences as professor assistant for the Med School Biostatistic class at Universidade Federal do Triângulo Mineiro, on the 2022/1 semester. It was proposed to create a ficticious scientific paper, step-by-step, inspired by real paper chosen by each student. However, many students had trouble using the paper's pre-existing variables and creating their own in a way that would make the best use of their syllabus, as well as the manual process of populating their databases and the distribution of their data. Helping one student to automate the creation of her data, we started applying real or reality-inspired conditions to the random generators. I later decided to expand such idea to a big database that would be flexible enough to discard the need for a paper and still allow for variety among the entire class.
- [Status] (## Status)
Under construction
This application takes two basic variables: n
and seed
. The n
variable describes how many observations will be on the database, and seed
is used to guarantee reproductibility. Additional chances for each variable may be tweeked as desired.
The end database was originally intended to be used in as a population from which samples can be taken, as samples that can be created for each student, or as a mix of both. I suggest the use of a large n
to create a population database, from which each student will devise a particular annalisys plan, followed by a sampling process to symbolize data available from literature or a small study in order to calculate sample size for the proper analysis, and then another sampling for the actual analysis project. The application is design in such a way that students may proceed with analysis of the data as-is, or may taper their population's attributes as desired.