-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathNYC Guided project.Rmd
593 lines (422 loc) · 25.9 KB
/
NYC Guided project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
---
title: "NYC Schools Survey Project"
author: "Fredrick Boshe"
date: "30/03/2021"
output:
github_document: default
rmarkdown::github_document: default
---
# An analysis of a surveyed perception of New York City school quality
This project looks into the academic performance of high schools in New York (2011) with additional demographic and socio-economic indicators. As i explore how performance was influenced with such indicators, i will also include a survey on the perceptions of students, teachers and parents on the quality of New York City schools.
I used data visualization to explore:
1. What is the relationship between demography and school performance in NYC?
**Goal:** Observe any differences in academic performance across NYC schools’ demographics.
2. What is the relationship between perceptions on academic performance and surrounding factors such as demographics and success metrics of schools?
**Goal:** Observe how perceptions vary across demographics and if these perceptions match actual school performances.
3. Are there any similarities to how survey respondents perceive the quality of NYC schools?
**Goal:** Observe any elements of echo chambers of opinions within groups and across groups
The survey data can be found [here](https://data.cityofnewyork.us/Education/2011-NYC-School-Survey/mnz3-dyi8) while the performance and demographics data can be found [here](https://data.world/dataquest/nyc-schools-data/workspace/file?filename=combined.csv).
```{r setup, include=FALSE, results='hide', message=FALSE, warning=FALSE}
#Loading packages
library(tidyverse)
library(ggplot2)
library(dplyr)
library(corrplot)
library(plotly)
library(ggcorrplot)
library(scales)
library(tufte)
devtools::install_github("hadley/emo")
```
```{r sys, include=FALSE}
Sys.setenv("plotly_username"="rickymarvel")
Sys.setenv("plotly_api_key"="RrlUUhrCU8EIF2f840Eb")
```
#### **Import and preview data**
```{r Data Import, include=TRUE, results='hide', message=FALSE, warning=FALSE }
combined<-read_csv("combined.csv")
gened<-read_tsv("masterfile11_gened_final.txt")
spced<-read_tsv("masterfile11_d75_final.txt")
```
Both gened dataframe (General education schools) and spced dataframe (Special education schools), have a key (dbn) that can be matched to the combined dataframe (performance and demographics).
Special attention to rename "dbn" to "DBN" so they can match. Also note that some variables such as "principal", are not useful to this analysis. We shall drop such columns. Read the dataset dictionary for more information
The survey data is also granular, we would like to obtain aggregate perceptions of respondents for the schools in question.
The factors that respondents were surveyed on were:
- Safety and Respect
- Communication
- Engagement
- Academic Expectations
The surveyed groups were:
- Parents
- Teachers
- Students
- Total (The average of parent, teacher, and student scores)
#### **Data cleaning and manipulation**
```{r cleaning,include=TRUE, results='hide', message=FALSE, warning=FALSE }
#Filter to remain with high school and aggregate columns only
gened_clean<-gened%>%
rename(DBN=dbn)%>%
filter(schooltype=="High School")%>%
select(-2:-7,-33:-1942 )
spced_clean<-spced%>%
rename(DBN=dbn)%>%
filter(is.na(highschool))%>%
select(-2:-7, -33:-1773)
#Check missing values
colSums(is.na(gened_clean)) #Two missing student respondents
colSums(is.na(spced_clean)) #One missing student respondent
#Combine the dataframes
survey<-rbind(gened_clean,spced_clean)
combined<-combined%>%
right_join(survey, by="DBN") #right join so we only keep corresponding data that we can use to compare
```
Using right_join() helps keep all the survey data that corresponds to matching demographic and school performance data.
We analyze how perception relates to demographic and school performance via a correlation matrix.
<br></br>
### **Relationship Analysis**
#### Disparity in performance between NYC Boroughs
While SATs have been [questioned](https://www.washingtonpost.com/news/answer-sheet/wp/2017/04/19/34-problems-with-standardized-tests/) on their validity and reliability to accurately measure students, I believe they can be used as a metric to compare high schools.
```{r analysis1, include=TRUE, results='hide', message=FALSE, warning=FALSE, fig.align= "center"}
#Analyze the disparity in performance between the Boroughs
combined_boro<-combined%>%
filter(!is.na(boro))
combined_boro_longer<-combined_boro%>%
pivot_longer(cols = c(asian_per, black_per, hispanic_per, white_per),
names_to="race",
values_to="percent")
#Plot Performance disparity between Boroughs
fig1<-combined_boro_longer%>%ggplot(aes(x=boro, y=avg_sat_score, fill=boro))+
geom_boxplot()+
theme_minimal()+
labs(title = "Average school performance by \nBorough in NYC (2011)",
y="Average SAT Score",
x="Borough",
fill="Borough")+
theme(plot.title = element_text(hjust = 0.5, size = 12, face="bold"),
legend.position="none")
```
<center>
```{r include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.align= "center", fig.width=7.5, fig.height=5.5}
fig1<-ggplotly(fig1)%>%
partial_bundle()#%>% config(responsive=TRUE, displayModeBar = FALSE)
#api_create(fig1, filename = "Average school performance by Borough in NYC (2011)")
fig1
```
</center>
<br><br>
Schools in the Bronx are averaging lower SAT scores as compared to schools in Manhattan or Staten Island. Not surprisingly, poverty rates are higher in the Bronx than Staten Island or Manhattan.
Overall, school performance in New York City is indeed split along the demographic backgrounds and socio-economic lines.
<br><br>
<div class='tableauPlaceholder' id='viz1618409905548' style='position: relative'><noscript><a href='#'><img alt='NYC profile ' src='https://public.tableau.com/static/images/NY/NYCSocio-economicprofile/NYCprofile/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='NYCSocio-economicprofile/NYCprofile' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/NY/NYCSocio-economicprofile/NYCprofile/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1618409905548'); var vizElement = divElement.getElementsByTagName('object')[0];if ( divElement.offsetWidth > 800 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.65)+'px';} else if ( divElement.offsetWidth > 500 ) { vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.65)+'px';} else { vizElement.style.width='100%';vizElement.style.height='650px';} var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement);
</script>
<br><br>
#### Trend of NYC school performance with racial diversity of schools
Pivot longer the dataset to ensure the diversity indicators are witin a single column and their values in a subsequent column. This will help during the plotting excercise.
```{r analysis, include=TRUE, results='hide', message=FALSE, warning=FALSE, fig.align= "center", fig.width=7, fig.height=6}
#Pivot table longer to allow plotting of different races on the same plot
combined_race_longer<-combined%>%
pivot_longer(cols = c(asian_per, black_per, hispanic_per, white_per),
names_to="race",
values_to="percent")
combined_type_longer<-combined%>%
pivot_longer(cols = c(frl_percent, ell_percent, sped_percent),
names_to="Type of program",
values_to="percent")
#A correlation matrix for the variables we are interested in
cor_mat<-combined%>%
select(avg_sat_score,frl_percent, ell_percent, sped_percent,
ends_with("_per"))%>%
select(-female_per)%>%
cor(use = "pairwise.complete.obs")
colnames(cor_mat)<-c("Avg_SAT_score", "% food dis",
"% learning eng","% Sped",
"% Asian", "% Black", "% Hispanic", "% White",
"% male")
rownames(cor_mat)<-c("Avg_SAT_score", "% food dis",
"% learning eng","% Sped",
"% Asian", "% Black", "% Hispanic", "% White",
"% male")
```
#### How racial diversity relates to school performance
Renaming of indicators to help improve the aesthetics of the chart.
```{r plot, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.align= "center", fig.width=8, fig.height=6}
#Relationship between school performance and their racial diversity
race.labs<-c("Asian", "Black", "Hispanic", "White")
names(race.labs)<-c("asian_per", "black_per", "hispanic_per",
"white_per")
fig2<-combined_race_longer%>%ggplot(aes(x=percent, y=avg_sat_score, color=race))+
geom_point()+
theme_minimal()+
labs(title="Relationship between SAT scores & \nracial composition of schools",
x= "Percentage composition",
y= "Average SAT score")+
scale_color_manual(
name="Race",
values = c("Darkorange", "DarkGreen", "DarkBlue", "Purple"),
labels=c("Asian", "Black", "Hispanic", "White")
)+
theme(plot.title = element_text(hjust = 0.5,face="bold", size=12,
margin = margin(t = 0, r = 0, b = 10, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 16,
b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 15, r = 0,
b = 0, l = 0)),
legend.position="none")+
facet_wrap(~race, labeller = labeller(race=race.labs))
fig2<-ggplotly(fig2)
#This helps find out how plotly has listed the annotations
#str(fig2[['x']][['layout']][['annotations']])
#Locates the x position of the yaxis titles
#fig2[['x']][['layout']][['annotations']][[2]][['x']]
#fig2[['x']][['layout']][['annotations']][[1]][['y']]
#move the y-axis title more left and x-axis title lower
fig2[['x']][['layout']][['annotations']][[2]][['x']] <- -0.07
fig2[['x']][['layout']][['annotations']][[1]][['y']] <- -0.05
fig2 %>% layout(margin = list(l = 75, t=75, b=50))
```
Schools with a higher proportion of Black and Hispanic have lower SAT scores as compared to schools with a higher share of White and Asian students.This could be due to the location, programs and resources at the disposal of schools.
Schools in boroughs like Staten Island and Manhattan would have more resources and have a higher proportion of White and Asian students as compared to schools in Bronx and Brooklyn.
#### How schools with special programs fair in academic performance
```{r perform, include=TRUE, warning=FALSE, message=FALSE, fig.align= "center", fig.width=8, fig.height=6}
#Trend of SAT scores with the type of program available for students
prog.labs<-c("Learning to speak English", "Receiving lunch discount",
"Receiving specialized teaching")
names(prog.labs)<-c("ell_percent","frl_percent", "sped_percent")
fig3<-combined_type_longer%>%ggplot(aes(x=percent, y=avg_sat_score, color=`Type of program`))+
geom_point()+
theme_minimal()+
labs(title="Relationship between SAT scores and socio-economic factors",
x= "Percentage composition",
y= "Average SAT score")+
scale_color_manual(
name="Student",
values = c("maroon", "forestgreen", "chocolate"),
labels=c("Receiving lunch \ndiscount", "Learning to speak \nEnglish", "Receiving specialized \nteaching")
)+
theme(plot.title = element_text(hjust = 0.5, face="bold",
margin = margin(t = 0, r = 0, b = 10, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 5,
b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 15, r = 0,
b = 0, l = 0)),
legend.position="none")+
facet_wrap(~`Type of program`, labeller = labeller(`Type of program`=prog.labs))
fig3<-ggplotly(fig3)
#This helps find out how plotly has listed the annotations
#str(fig3[['x']][['layout']][['annotations']])
#Locates the x position of the yaxis titles
#fig3[['x']][['layout']][['annotations']][[2]][['x']]
#fig3[['x']][['layout']][['annotations']][[1]][['y']]
#move the y-axis title more left and x-axis title lower
fig3[['x']][['layout']][['annotations']][[2]][['x']] <- -0.06
fig3[['x']][['layout']][['annotations']][[1]][['y']] <- -0.05
fig3 %>% layout(margin = list(l = 75, t=75, b= 50))
```
The most significant observation is how drastically school performance drops with increase in number of students eligible for lunch discount at a school. This is an indication of a school being in a socially and economically disadvantaged location. Such schools rarely have enough resources to help students perform better academically.
An increase in the share of students learning to speak English is also related to lower academic performance for the school. Learning to speak English can be an indicator of share of students with a lower economic migrant background.
<br><br>
```{r programs, echo=FALSE, fig.width=8, fig.height=5.5, fig.align="center", message=FALSE, warning=FALSE}
combined2<-combined%>%
drop_na(boro)%>%
drop_na(frl_percent)%>%
drop_na(sped_percent)%>%
drop_na(ell_percent)
combined2<-combined2%>%
mutate(food=ifelse(frl_percent>=30, "Food discount", "Low"),
english=ifelse(ell_percent>=30, "Learning to speak English", "Low"),
sped=ifelse(sped_percent>=30, "Specialized teaching", "Low"))
combined2<-combined2%>%
pivot_longer(cols = c(food, english, sped),
names_to="program",
values_to="cat")
combined2$cat<-as.factor(combined2$cat)
combined2$cat<-ordered(combined2$cat, levels = c("Specialized teaching",
"Learning to speak English",
"Food discount","Low"))
#Availability vs Fishing rate
plotdata <- combined2%>%
group_by(boro, cat)%>%
summarize(n = n())%>%
mutate(pct = n/sum(n),
lbl = scales::percent(pct))
fig4<-plotdata%>%
ggplot(aes(x=boro, y=pct, fill=cat))+
geom_bar(stat = "identity", position = "fill")+
theme_minimal()+
theme(plot.title = element_text(color="black", size=12, face="bold",hjust = 0.5,
margin = margin(t = 0, r = 0, b = 10, l = 0)),
axis.ticks = element_blank(),
legend.title = element_blank(),
axis.title.x = element_text(color="black", size=12, face="bold",
margin = margin(t = 0, r = 0,
b = 10, l = 0)),
axis.title.y = element_text(color="black", size=12,
margin = margin(t = 0, r = 10,
b = 0, l = 0)))+
scale_y_continuous(breaks = seq(0, 1, .2),label = percent)+
geom_col(alpha = 0.8, width = 0.85) +
labs(title = "Percentage of schools with at least a third of \nstudents in special programs",
x="",
y="Percent")+
scale_fill_manual(name="Student",
values = c("dodgerblue2", "firebrick2", "tan3", "limegreen"),
labels=c("Receiving lunch \ndiscount", "Learning to speak \nEnglish", "Receiving specialized \nteaching"))
#Plotly to make the chart more interactive
fig4<- ggplotly(fig4)%>% add_annotations( text="Program",
xref="paper", yref="paper",
x=1.02, xanchor="left",
y=0.9, yanchor="bottom",
legendtitle=TRUE, showarrow=FALSE )%>%
layout(legend=list(y=0.9, yanchor="top"),xaxis=list(tickangle=-45))%>%
style(hoverinfo = 'none')#%>%config(responsive=TRUE, displayModeBar = FALSE)
#api_create(fig4, filename = "Percentage of schools with at least a third of students in special programs")
fig4
```
Creating an arbitrary cut-off of 30%, i looked into the share of schools in each borough with at least a third of its students eligible or participating in the special programs. Again, Staten Island had the fewest schools with students taking up special programs while the Bronx had the most. This matches with the socio-economic profiles of the respective boroughs.
This gap also translates to average SAT scores as shown in an earlier chart between the boroughs.
```{r perfrom2, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}
p.mat<-cor_pmat(cor_mat)#pvalues
cor_tib<-cor_mat%>%
as_tibble(rownames="variable")#A tibble for easier viewing
#Variables with strong relationship with Avg sat score
strong_cor<-cor_tib%>%
select(variable, Avg_SAT_score)%>%
filter(Avg_SAT_score < -0.25|Avg_SAT_score >0.25)
strong_cor_mat<-as.matrix(strong_cor)
#Plot matrix table to see strength of relationships
fig5 <- ggcorrplot(
cor_mat, hc.order = TRUE, type = "lower", title = "Correlation matrix between school \nperformance and demographics", outline.col = "white", p.mat = p.mat)+
theme(plot.title = element_text(size=12, face="bold", hjust = 0.5))
```
<center>
```{r include=TRUE, echo=FALSE,fig.align= "center", warning=FALSE, fig.width=7, fig.height=6,}
fig5<-ggplotly(fig5)#%>%config(responsive=TRUE, displayModeBar = FALSE)
#api_create(fig5, filename = "Correlation between school \nperformance and demographics")
fig5
```
</center>
A correlation analysis shows how strong the relationships between racial diversity and school programs are with the academic performance of the school.
When it comes to demographics, academic performance shows a strong positive correlation with the percentage of white students in a school (r= 0.65). While it has the strongest negative relationship with the percentage of students that are eligible for food discount at a school (r= -0.72).
#### How teachers, students and parents rate NYC schools
```{r rating, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.align= "center", fig.width=7, fig.height=6}
#establish average scores per respondent group
survey_score<-combined%>%
mutate(parent_score_avg=(saf_p_11+com_p_11+eng_p_11+aca_p_11)/4)%>%
mutate(teacher_score_avg=(saf_t_11+com_t_11+eng_t_11+aca_t_11)/4)%>%
mutate(student_score_avg=(saf_s_11+com_s_11+eng_s_11+aca_s_11)/4)
#Pivot table longer to make it easier to plot on same scatter plot
survey_score_longer<-survey_score%>%
pivot_longer(cols = c(parent_score_avg, teacher_score_avg, student_score_avg),
names_to= "respondent",
values_to= "avg_score")
#Plot the spread of the respondents average scores
fig6<-survey_score_longer%>%ggplot(aes(x=respondent, y=avg_score, fill=respondent))+
geom_boxplot(show.legend = FALSE)+
theme_minimal()+
scale_x_discrete(labels=c("parent_score_avg" = "Parent", "student_score_avg" = "Student","teacher_score_avg" = "Teacher"))+
labs(title = "Perception of NYC schools quality \n(survey responses)",
y="Rating",
x="Respondent")+
theme(plot.title = element_text(hjust = 0.5, size=12, face="bold",
margin = margin(t = 0, r = 0, b = 10, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 10,
b = 0, l = 0)),
axis.title.x = element_text(margin = margin(t = 0, r = 0,
b = 10, l = 0)),
legend.position="none")
#fig6<-ggplotly(fig6)#%>%config(responsive=TRUE, displayModeBar = FALSE)
#api_create(fig6, filename = "Perception of NYC schools quality (survey responses)")
fig6
```
<br><br>
Overall, parents seem to rate NYC schools the highest while students (not surprisingly `r emo::ji("smile")`) are rating the schools lowest. Teachers show the most broad rating of NYC schools.
Ratings on school performance across the four benchmarks are really close for each respondent. 50% of the parents, for example, rate schools between 7.47 and 8.20, a difference of just 0.72.
This is an indication of how similar the perceptions of individual groups are amongst themselves.
#### Survey responses on quality of NYC schools
```{r questions, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE, fig.align= "center", fig.width=8, fig.height=5.5}
#Pivot table longer to establish individual rating per question type
que_score<-combined%>%
pivot_longer(cols = c(saf_p_11:aca_s_11),
names_to="questions",
values_to="Rating")
#Create variables out of the pivoted column
que_score <- que_score %>%
mutate(respondent = str_sub(questions, 4, 6)) %>%
mutate(question = str_sub(questions, 1, 3))
que_score <- que_score%>%
mutate(respondent = ifelse(respondent == "_p_", "parent",
ifelse(respondent == "_t_", "teacher",
ifelse(respondent == "_s_", "student",
ifelse(respondent == "_to", "total", "NA")))))
#Boxplot to see if there appear to be differences in how the three groups of responders (parents, students, and teachers) answered the four questions.
que_score<-as.data.frame(que_score)
fig7<-plot_ly(data = que_score, x=~question, y =~Rating, color =~respondent,
type = "box") %>%
layout(boxmode = "group", title="<b>NYC School quality perception survey questions and responses<b>",
xaxis = list(title='survey questions',
ticktext = list("Academic \nExpectations", "Communication", "Engagement", "Safety and \nRespect"),
tickvals = list("aca","com","eng","saf"),
yaxis = list(title='Percentage')))#%>%config(responsive=TRUE, displayModeBar = FALSE)
#api_create(fig7, filename = "NYC School quality perception survey questions and responses")
fig7
```
<br><br>
Looking on how individual groups responded, the best rating was for **safety and respect** by parent respondents and the lowest was for **communication** by student respondents.
Interesting to note that across each borough, the responses to the survey were somewhat similar. The box plot below shows how even the worst academically performing borough (Bronx) has half the respondents rating schools at a score of 7.17 while Staten Island, which is better off academically and economically has half the respondents scoring their schools at 7.30 rating.
```{r boro, include=TRUE, echo=FALSE, warning=FALSE, message=FALSE, fig.align= "center", fig.width=7, fig.height=6}
#establish average scores per respondent group
survey_score<-combined%>%
mutate(parent_score_avg=(saf_p_11+com_p_11+eng_p_11+aca_p_11)/4)%>%
mutate(teacher_score_avg=(saf_t_11+com_t_11+eng_t_11+aca_t_11)/4)%>%
mutate(student_score_avg=(saf_s_11+com_s_11+eng_s_11+aca_s_11)/4)
#Pivot table longer to make it easier to plot on same scatter plot
survey_score_longer<-survey_score%>%
pivot_longer(cols = c(parent_score_avg, teacher_score_avg, student_score_avg),
names_to= "respondent",
values_to= "avg_score")
#Plot the spread of the respondents average scores
fig8<-survey_score_longer%>%
drop_na(boro)%>%
ggplot(aes(x=boro, y=avg_score, fill=boro))+
geom_boxplot(show.legend = FALSE)+
theme_minimal()+
labs(title = "Perception of NYC schools quality \nby Borough (survey responses)",
y="Rating",
x="Borough")+
theme(plot.title = element_text(hjust = 0.5, size=12, face="bold"),
legend.position="none")
fig8<-ggplotly(fig8)#%>%config(boro, displayModeBar = FALSE)
#api_create(fig8, filename = "Perception of NYC schools quality by Borough (survey responses)")
fig8
```
```{r corr2, echo=FALSE, fig.width=6, fig.height=5,fig.align='center', warning=FALSE}
comb<-combined%>%
select(avg_sat_score, contains("tot_11"))
cor_mat<-comb%>%
cor(use = "pairwise.complete.obs")
colnames(cor_mat)<-c("Avg_SAT_score", "Safety", "Communication", "Engagement",
"Academic")
rownames(cor_mat)<-c("Avg_SAT_score", "Safety", "Communication", "Engagement",
"Academic")
p.mat<-cor_pmat(cor_mat)#pvalues
cor_tib<-cor_mat%>%
as_tibble(rownames="variable")#A tibble for easier viewing
#Variables with strong relationship with Avg sat score
strong_cor<-cor_tib%>%
select(variable, Avg_SAT_score)%>%
filter(Avg_SAT_score < -0.25|Avg_SAT_score >0.25)
strong_cor_mat<-as.matrix(strong_cor)
#Plot matrix table to see strength of relationships
fig9 <- ggcorrplot(
cor_mat, hc.order = FALSE, type = "lower", title = "Correlation matrix between school \nperformance and perception",
outline.col = "white", p.mat = p.mat)+
theme(plot.title = element_text(size=10, face="bold", hjust = 0.5,
margin = margin(t = 0, r = 0, b = 10, l = 0)))
fig9
```
This proves what we saw in the correlation matrix, that how respondents rated schools do not have a strong correlation nor a statistically significant one with actual school performance. But how respondents rated schools do closely correlate to each other.
This could mean there is a perception bubble among teachers, students and parents. Sadly, it does not match reality. Actual school performance is not influencing how parents, students, and teachers actually perceive the quality of schools in their neighborhoods. Rather, it seems school quality rating is similar within the individual groups, an information bubble of parents, students, and teachers, individually.
Perceptions are a funny thing.
>“Humans see what they want to see.”
>
>`r tufte::quote_footer('--- Rick Riordan, The Lightning Thief')`