https://kathylt.shinyapps.io/info201-group-project/
The dataset that we are going to use for our project is about police killings in 2015 from the website fivethirtyeight. This dataset can be found at https://github.com/fivethirtyeight/data/tree/master/police-killings. In the dataset that they compiled, they linked entries from the database on police killings from the Guardian, a primary British newspaper, with census information from the American Community Survey. This results in a dataset that merges specific information about the police killings such as a street address and the law enforcement agency that had jurisdiction with the demographic information of the specific location where the killing took place. Our target audience would be everyday people of voting age. Due to the politics surrounding police killings and gun violence, it is crucial that this information and the possible correlations in the data are communicated in a way that allows any individual of this age to formulate their opinions from seeing this information visually and through a new lens. Our audience most like wants to see relationships and trends in the data to make sense of the devastating killings and the underlying details surrounding their occurrences. We aim to investigate and answer these questions and more:
- Is the correlation between the number of killings and the ethnicity of the individual the same throughout the country?
- Does the victim being armed or not armed affect different ethnicities differently in terms of the frequency of killings?
- Which gender, age range, and race was the most killed by police this year?
The police killings dataset we are working with is a static csv file that we will be reading in. To work with the data, we will clean up the data to make it easier to work with. This will be done by reformatting the table through the selection of specific columns that we wish to work with using dplyr. The dataset contains a wide range of details pertaining to the shooting that are informative but do not all play a role in what we wish to visualize. Examples of this would be the columns detailing and county and state FIPS code and various other id codes. The major libraries we will be using to create our visualizations are ggplot2 and plotly. We will mainly be using plotly for our map and anticipate using ggplot2 for some of our simpler tables and plots that will each pertain to a specific relationship or dynamic between two or three categorical variables. After cleaning up the data according to what variables we will be analyzing, we will display the correlation or relationships we find in the most clear visual encoding possible (with color, size, etc.) in our plots, charts, and map. Some major challenges we anticipate are funneling the data into the significant relationships. There are so many different facets of the dataset to explore and it will be all about trying to achieve a balance between how these facets all interact and at the same time, still achieving clarity in the visualization of the data. Another major challenge would be collaborating on Github in general. Since we are all beginners at git collaboration, we will most likely run into small technical bumps and thus, this might translate to our overall visualization experience as not seeming very unified. However, with consistent communication, we will strive to overcome these difficulties.