This repository contains Team 2's work for project 1. Each team member was assigned a hypothesis and folders are broken up accordingly to show the code. An additional folder contains the final report and presentation slides.
For Project 1, Team 2 analyzed a dataset from Kaggle. The dataset consisted of various datapoints on legal cannabis and the hypotheses developed for this study are as follows:
- Hypothesis 1: Do hybrid types have more happy effects compared to non-hybrid types?
- Hypothesis 2: Does the sweet flavor cannabis have the highest ratings?
- Hypothesis 3: Do the breeders with the highest count of unique strains have the highest average ratings?
- Hypothesis 4: Is there a significant difference in ratings by type or ratings by effects?
In light of the expanding legislation and discussions where cannabis is at the forefront, Team 2's research brought about several critical questions highlighting the multi-faceted ways where cannabis could have an impact. For example, what societal, economic, or health implications are at play when states decide to resist the legalization of cannabis? How does the dynamic between breeders and their consumers change or evolve in states where cannabis is recreationally and/or medicinally legal vs. states where it is currently up for debate? The goal of addressing these questions is to understand the various government, industry, and societal perspectives that has made cannabis use more appealing in contemporary society.
The original dataset is titled, “Cannabis Strains.” The contents contained within this particular dataset included the name, type, rating, effects, taste, and descriptions. The main goal during the data cleaning process, once the raw data was pulled, was to preserve enough information to maximize the reliability of support for the findings. Team 2 refined the data columns and extracted breeder and location information from the description section. This resulted in a robust amount of information comprised of 2,351 rows that were used to examine and interpret the validity of the hypotheses. Using the clean and final dataset as a guide, four hypotheses were generated, all serving as focal points helping to unravel the informational diversity and formulating key insights found throughout the dataset.