Leveraged unsupervised learning techniques like principal component analysis and k-means clustering to compare demographics data for a German mail-order company to demographics of the German population at large. Used pandas and scikit-learn libraries to wrangle the data, perform dimensionality reduction, and clustering. Compared cluster sizes between the general population and the customer base to determine features that are aligned with the target audience.
There is no necessary libraries to run the code, except the ones included in the Jupyter Notebook. The code runs with no issues using Python 3 or newer versions.
This project was part of the Data Science Udacity Program and needed to be completed in order to obtain a certificate.
There is only one Notebook here that contains the full analysis regarding this project. This notebook is explanatory in identifying the customer segments. Also Markdown cells were used to walk through all the steps.
The main findings are found in the same Jupyter Notebook.