-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlinear_regression_BMI_Base.Rmd
43 lines (34 loc) · 1.95 KB
/
linear_regression_BMI_Base.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
### BMI vs Total Base Stat
Now, we are gonna answer the following question: How does height and weight (in other words, BMI) of a Pokemon correlate with its various base stats?
To do so, we're gonna use a linear regression model that plots the BMI vs. the Total. Right now, the total is defined as the sum of all the base stats, which include HP, Attack, Defense, Sp. Atk, Sp. Def, and speed. BMI, or Body Mass Index, is calculated as the kg/m^2. Let's plot the graph of BMI vs. the Total Base Stats using ggplot().
```{r BMI}
library(tidyverse)
library(broom)
pokedex <- pokedex.table
pokedex %>%
ggplot(aes(x=as.numeric(BMI), y=Total)) +
geom_point() +
geom_smooth(method=lm) +
labs(title="BMI vs. Total Base Stats",
x = "BMI",
y = "Total Base Stats")
```
As you can see here, a lot of the data points are clustered to the left side, and there appears to be no linear trend among pokemon. The BMI of a Pokemon does not has little to no correlation on what its Total Base Stats may be. However, let's see what we get as the linear regression equation, we'll worry about whether it is accurate or not later.
```{r equation}
auto_fit <- lm(Total~as.numeric(BMI), data=pokedex.table)
auto_fit
```
As you can see, the linear regression equation is 45.6167 - 0.1614(x). How much confidence we can put in this equation, let's set up a 95% confidence interval.
```{r}
auto_fit_stats <- auto_fit %>%
tidy() %>%
select(term, estimate, std.error)
auto_fit_stats
```
```{r}
confidence_interval_offset <- 1.95 * auto_fit_stats$std.error[2]
confidence_interval <- round(c(auto_fit_stats$estimate[2] - confidence_interval_offset,
auto_fit_stats$estimate[2],
auto_fit_stats$estimate[2] + confidence_interval_offset), 4)
```
Given our confidence interval, we would say that on average, for every k/m^2, we are 95% confident that the base stats will lie within (-0.3392, 0.0164) fewer.