This is the Capstone Project for Course 9, IBM Data Analyst Capstone Project. Part of IBM's Data Analyst Professional Certificate from Coursera. Available here: https://www.coursera.org/programs/jda20232t1-z1hse/professional-certificates/ibm-data-analyst?collectionId=Wxyxq
We will take on the role of a Data Analyst with a global IT and Business services firm. In this role, we will be analyzing several datasets to help identify trends for emerging technologies. We have recently been hired as a Data Analyst by a global IT and business consulting services firm that is known for its expertise in IT solutions and its team of highly experienced IT consultants. To keep pace with changing technologies and remain competitive, our organization regularly analyzes data to help identify future skill requirements.
As a Data Analyst, we will be assisting with this initiative and have been tasked with collecting data from various sources and identifying trends for this year's report on emerging skills.
Our first task is to collect data for the technology skills that are most in demand from various sources including job postings, blog posts, and surveys. We will begin by scraping internet websites and accessing APIs to collect data in various formats like .csv files, excel sheets, and databases.
Once we've collected enough data we will take the collected data and prepare it for analysis by using data wrangling techniques like finding duplicates, removing duplicates, finding missing values, and inputting missing values.
Now that the data is ready we will apply statistical techniques to analyze the data and identify insights and trends like: What are the top programming languages that are in demand? What are the top database skills that are in demand? What are the most popular IDEs? And Demographic data like gender and age distribution of developers.
In the fourth task, we'll focus on choosing appropriate visualizations based on the data we want to present using charts, plots, and histograms to help reveal our findings and trends. We are going to access the Data from an SQL database and pull only the data we need into DataFrames.
For task 5, we will employ Cognos/Google Looker Studio to create interactive dashboards to help analyze and present the data dynamically.
For the final task, we will use our storytelling skills to provide a narrative and present the findings of our analysis. Full presentation link: https://www.canva.com/design/DAGCO32O1hs/i6ag-UXsZqQ8_E5A-mI9bA/edit?utm_content=DAGCO32O1hs&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Stack Overflow, a popular website for developers, conducted an online survey of software professionals across the world. The survey data was later open sourced by Stack Overflow. The actual data set has around 90,000 responses.
The dataset we are going to use comes from the following source: https://stackoverflow.blog/2019/04/09/the-2019-stack-overflow-developer-survey-results-are-in/ under a ODbL: Open Database License.
We will be given a subset of the original data set in this capstone project. We will explore, analyze, and visualize this dataset and present our analysis.
Note: This randomised subset contains around 1/10th of the original data set. Any conclusions we draw after analyzing this subset may not reflect the real world scenario.
The dataset is available as a .csv file here.
The below table lists the questions asked in the survey and the column under which the response was collected.
View Table
Column Name | Question Text |
---|---|
Respondent | Randomized respondent ID number (not in order of survey response time) |
MainBranch | Which of the following options best describes you today? Here, by “developer” we mean “someone who writes code.” |
Hobbyist | Do you code as a hobby? |
OpenSourcer | How often do you contribute to open source? |
OpenSource | How do you feel about the quality of open source software (OSS)? |
Employment | Which of the following best describes your current employment status? |
Country | In which country do you currently reside? |
Student | Are you currently enrolled in a formal, degree-granting college or university program? |
EdLevel | Which of the following best describes the highest level of formal education that you’ve completed? |
UndergradMajor | What was your main or most important field of study? |
EduOther | Which of the following types of non-degree education have you used or participated in? Please select all that apply. |
OrgSize | Approximately how many people are employed by the company or organization you work for? |
DevType | Which of the following describe you? Please select all that apply. |
YearsCode | Including any education, how many years have you been coding? |
Age1stCode | At what age did you write your first line of code or program? (E.g., webpage, Hello World, Scratch project) |
YearsCodePro | How many years have you coded professionally (as a part of your work)? |
CareerSat | Overall, how satisfied are you with your career thus far? |
JobSat | How satisfied are you with your current job? (If you work multiple jobs, answer for the one you spend the most hours on.) |
MgrIdiot | How confident are you that your manager knows what they’re doing? |
MgrMoney | Do you believe that you need to be a manager to make more money? |
MgrWant | Do you want to become a manager yourself in the future? |
JobSeek | Which of the following best describes your current job-seeking status? |
LastHireDate | When was the last time that you took a job with a new employer? |
LastInt | In your most recent successful job interview (resulting in a job offer), you were asked to… (check all that apply) |
FizzBuzz | Have you ever been asked to solve FizzBuzz in an interview? |
JobFactors | Imagine that you are deciding between two job offers with the same compensation, benefits, and location. Of the following factors, which 3 are MOST important to you? |
ResumeUpdate | Think back to the last time you updated your resumé CV, or an online profile on a job site. What is the PRIMARY reason that you did so? |
CurrencySymbol | Which currency do you use day-to-day? If your answer is complicated, please pick the one you’re most comfortable estimating in. |
CurrencyDesc | Which currency do you use day-to-day? If your answer is complicated, please pick the one you’re most comfortable estimating in. |
CompTotal |
What is your current total compensation (salary, bonuses, and perks,
before taxes and deductions), in CurrencySymbol ? Please
enter a whole number in the box below, without any punctuation. If you
are paid hourly, please estimate an equivalent weekly, monthly, or
yearly salary. If you prefer not to answer, please leave the box empty.
|
CompFreq | Is that compensation weekly, monthly, or yearly? |
ConvertedComp | Salary converted to annual USD salaries using the exchange rate on 2019-02-01, assuming 12 working months and 50 working weeks. |
WorkWeekHrs | On average, how many hours per week do you work? |
WorkPlan | How structured or planned is your work? |
WorkChallenge | Of these options, what are your greatest challenges to productivity as a developer? Select up to 3: |
WorkRemote | How often do you work remotely? |
WorkLoc | Where would you prefer to work? |
ImpSyn | For the specific work you do, and the years of experience you have, how do you rate your own level of competence? |
CodeRev | Do you review code as part of your work? |
CodeRevHrs | On average, how many hours per week do you spend on code review? |
UnitTests | Does your company regularly employ unit tests in the development of their products? |
PurchaseHow | How does your company make decisions about purchasing new technology (cloud, AI, IoT, databases)? |
PurchaseWhat | What level of influence do you, personally, have over new technology purchases at your organization? |
LanguageWorkedWith | Which of the following programming, scripting, and markup languages have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the language and want to continue to do so, please check both boxes in that row.) |
LanguageDesireNextYear | Which of the following programming, scripting, and markup languages have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the language and want to continue to do so, please check both boxes in that row.) |
DatabaseWorkedWith | Which of the following database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.) |
DatabaseDesireNextYear | Which of the following database environments have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the database and want to continue to do so, please check both boxes in that row.) |
PlatformWorkedWith | Which of the following platforms have you done extensive development work for over the past year? (If you both developed for the platform and want to continue to do so, please check both boxes in that row.) |
PlatformDesireNextYear | Which of the following platforms have you done extensive development work for over the past year? (If you both developed for the platform and want to continue to do so, please check both boxes in that row.) |
WebFrameWorkedWith | Which of the following web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.) |
WebFrameDesireNextYear | Which of the following web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.) |
MiscTechWorkedWith | Which of the following other frameworks, libraries, and tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the technology and want to continue to do so, please check both boxes in that row.) |
MiscTechDesireNextYear | Which of the following other frameworks, libraries, and tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the technology and want to continue to do so, please check both boxes in that row.) |
DevEnviron | Which development environment(s) do you use regularly? Please check all that apply. |
OpSys | What is the primary operating system in which you work? |
Containers | How do you use containers (Docker, Open Container Initiative (OCI), etc.)? |
BlockchainOrg | How is your organization thinking about or implementing blockchain technology? |
BlockchainIs | Blockchain / cryptocurrency technology is primarily: |
BetterLife | Do you think people born today will have a better life than their parents? |
ITperson | Are you the “IT support person” for your family? |
OffOn | Have you tried turning it off and on again? |
SocialMedia | What social media site do you use the most? |
Extraversion | Do you prefer online chat or IRL conversations? |
ScreenName | What do you call it? |
SOVisit1st | To the best of your memory, when did you first visit Stack Overflow? |
SOVisitFreq | How frequently would you say you visit Stack Overflow? |
SOVisitTo | I visit Stack Overflow to… (check all that apply) |
SOFindAnswer | On average, how many times a week do you find (and use) an answer on Stack Overflow? |
SOTimeSaved | Think back to the last time you solved a coding problem using Stack Overflow, as well as the last time you solved a problem using a different resource. Which was faster? |
SOHowMuchTime | About how much time did you save? If you’re not sure, please use your best estimate. |
SOAccount | Do you have a Stack Overflow account? |
SOPartFreq | How frequently would you say you participate in Q&A on Stack Overflow? By participate we mean ask, answer, vote for, or comment on questions. |
SOJobs | Have you ever used or visited Stack Overflow Jobs? |
EntTeams | Have you ever used Stack Overflow for Enterprise or Stack Overflow for Teams? |
SOComm | Do you consider yourself a member of the Stack Overflow community? |
WelcomeChange | Compared to last year, how welcome do you feel on Stack Overflow? |
SONewContent | Would you like to see any of the following on Stack Overflow? Check all that apply. |
Age | What is your age (in years)? If you prefer not to answer, you may leave this question blank. |
Gender | Which of the following do you currently identify as? Please select all that apply. If you prefer not to answer, you may leave this question blank. |
Trans | Do you identify as transgender? |
Sexuality | Which of the following do you currently identify as? Please select all that apply. If you prefer not to answer, you may leave this question blank. |
Ethnicity | Which of the following do you identify as? Please check all that apply. If you prefer not to answer, you may leave this question blank. |
Dependents | Do you have any dependents (e.g., children, elders, or others) that you care for? |
SurveyLength | How do you feel about the length of the survey this year? |
SurveyEase | How easy or difficult was this survey to complete? |
python
v3.12.2pandas
for managing the data.numpy
for mathematical operations.seaborn
for visualizing the data.matplotlib
for additional plotting tools.folium
for geospatial data visualization such as choropleth maps.plotly
for interactive plotting tools.Google Looker Studio
for dashboards.IBM Cognos Analytics
for dashboards.
- Collecting Data Using APIs
- Collecting Data Using Web Scraping
- Exploring Data
- Finding Missing Values
- Determine Missing Values
- Finding Duplicates
- Removing Duplicates
- Normalizing Data
- Distribution
- Outliers
- Correlation
- Visualizing Distribution of Data
- Relationship
- Composition
- Comparison
- Dashboards
- Final Presentation
- Create Dashboard in Google Looker or Tableau