statDIST

The goal of statDIST is to make learning fundamental statistics concepts easier. At the core of this package is the sample_generator() function, which allows one to generate any number of samples containing any number of measurements up to one less than a hypothetical population distribution that is user-defined.

The fundamental premise is that seeing data as it is transformed is much better than memorizing formulas.

Installation

You can install the development version of statDIST from GitHub with:

# install.packages("devtools")
devtools::install_github("CodetoDrum/statDIST")

Example

Everything starts by generating samples to experiment with. The initial creation of this function was motivated by a desire to generate sampling distributions to actually see the concept in action.

The function is set so that any given sample within the resulting data frame contains unique samples from the hypothetical population distribution. A big reason for this is to simulate the realities of sampling in the real world. When research is done, one is generally taking a single data set worth of measurements to then utilize for inference and prediction. And this leads into the understanding that the combination formula provides.

$n C k$

When we sample a population, the resulting data set represents one of some total number of possible sets we could have collected for the given number of measurements we actually took. We have n total measurements available to take, but we only choose k of those measurements.

So, sample_data() effectively generates as many parallel sample sets for the same sample sizes from the same population. In the chunk below, we are basically saying create 10 parallel instances of the same means of sampling, collect 25 measurements in each of those instances from some hypothetical population with 500 possible measurements that can range from 0-250.

test_samples <- statDIST::sample_data(
  num_samples = 10,
  sample_size = 25,
  dist_size = 500,
  max_val = 250
)

head(test_samples$example_samples)
#>           V1       V2       V3          V4          V5        V6        V7
#> 1  92.086033 223.8434 164.4644 238.1381530 169.4489792 179.83298 167.00162
#> 2 192.180885 155.0948 240.6190  99.1509288 147.3917305  24.75027 113.98296
#> 3  74.555278 134.2413  41.4872 124.8308506   0.8874497  39.99445 104.02219
#> 4  90.113901 240.9650 249.3122 243.4629253 101.2703619 170.69763 128.43728
#> 5 149.085691 222.8487 223.3556   0.8874497 175.0157978  55.42809  70.11124
#> 6   5.278202 202.0703 176.2667 197.2487703 234.0841477 163.24607 178.53734
#>          V8        V9       V10
#> 1  98.43101  35.72676  67.54563
#> 2 200.41341 116.77145  37.91362
#> 3 172.17115  12.00192 142.55826
#> 4  24.39868 191.15153 192.18088
#> 5 116.38648 237.16065  33.98742
#> 6 186.76996 243.46293  22.23917

In this case, relative to our combinations intuition, we have n = 500 measurements, choosing k = 25 of them in 10 total instances. For this setup, you can view the total number of UNIQUE combinations of 25 measurements from a population of 500 units using a combination calculator. From there, you know we have 10 parallel instances of this total set, which if run shows that we still have a very small subset of the total possible instances we could have sampled.

However, we can use our generator function with other package functions to demonstrate what happens to distributions of statistics to any situation we wish to simulate relative to how those statistics approximate the true underlying population parameters.

Because the information from the hypothetical population is core to understanding other package outputs, those data are also housed in the output of sample_data(). It is important to not the output of this function comes as a list when using other package functions (which will very likely require data frames or vectors to work).

head(test_samples[[c("population_dist")]], n = 25)
#>  [1]  11.070913  78.751484 233.128789 229.008823  58.187088   5.278202
#>  [7]  39.994450  19.998839 104.864515 179.832978  58.891338 147.275206
#> [13] 235.539738 167.001618   2.738908 222.848717 102.426545  78.883112
#> [19]  23.356882 211.203585 121.699438  38.230661 226.556425 136.466689
#> [25]  87.295502

Name	Name	Last commit message	Last commit date
Latest commit CodetoDrum Update parameter description for samp_data Mar 12, 2023 19e918a · Mar 12, 2023 History 8 Commits
R	R	Update parameter description for samp_data	Mar 12, 2023
man	man	Bulk Upload	Mar 8, 2023
tests	tests	Bulk Upload	Mar 8, 2023
.Rbuildignore	.Rbuildignore	Bulk Upload	Mar 8, 2023
.gitignore	.gitignore	Initial commit	Mar 7, 2023
DESCRIPTION	DESCRIPTION	Bulk Upload	Mar 8, 2023
LICENSE	LICENSE	Bulk Upload	Mar 8, 2023
LICENSE.md	LICENSE.md	Bulk Upload	Mar 8, 2023
NAMESPACE	NAMESPACE	Bulk Upload	Mar 8, 2023
README.Rmd	README.Rmd	Update code chunks to reflect function updates	Mar 12, 2023
README.md	README.md	Update code chunks to reflect function updates	Mar 12, 2023
statDIST.Rproj	statDIST.Rproj	Initial commit	Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

statDIST

Installation

Example

About

Licenses found

Releases

Packages

Languages

License

CodetoDrum/statDIST

Folders and files

Latest commit

History

Repository files navigation

statDIST

Installation

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages