This Portfolio is a compilation of all the Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes. This portfolio also contains my Achievements, skills, and certificates. It is updated on the regular basis.
- Email: [email protected]
- LinkedIn: linkedin.com/archd3sai
- Recipient of Outstanding Master of Engineering - Industrial Engineering Student Award.
- Publication: Prognosis of Wind Turbine Gearbox Bearing Failures using SCADA and Modeled Data, Proceedings of the Annual Conference of the PHM Society 2020, Vol. 12 No. 1.
- Winner of a TAMU Datathon 2020 among 50+ teams.
- Recipient of TAMU Scholarship and Fee Waiver for excellent academic performance (4.0 GPA).
Customer Survival Analysis and Churn Prediction
In this project I have used survival analysis to study how the likelihood of the customer churn changes over time. I have also implementd a Random Forest model to predict the customer churn and deployed a model using flask webapp on Heroku. App
Instacart Market Basket Analysis
The objective of this project is to analyze the 3 million grocery orders from more than 200,000 Instacart users and predict which previously purchased item will be in user's next order. Customer segmentation and affinity analysis are also done to study user purchase patterns.
Hybrid-filtering News Articles Recommendation Engine
A hybrid-filtering personalized news articles recommendation system which can suggest articles from popular news service providers based on reading history of twitter users who share similar interests (Collaborative filtering) and content similarity of the article and user’s tweets (Content-based filtering).
Predictive Maintenance of Aircraft Engine
In this project I have used models such as RNN, LSTM, 1D-CNN to predict the engine failure 50 cycles ahead of its time, and calculated feature importance from them using sensitivity analysis and shap values. Exponential degradation and similarity-based models are also used to calculate its remaining life.
Wind Turbine Power Curve Estimation
In this project, I have employed regression techniques to estimate the Power curve of an on-shore Wind turbine. Nonlinear trees based regression methods perform best as true power curve is nonlinear. XGBoost is implemented and optimized using GridSearchCV which yields lowest Test RMSE-6.404.
Objective of this project is to identify the in-control data points and eliminate out of control data points to set up distribution parameters for manufacturing process monitoring. I utilized PCA for dimension reduction and Hotelling T2 and m-CUSUM control charts to established mean and variance matrices.
Objective of this project is to perform predictive assesment on the GDP of India through an inferential analysis of various socio-economic factors. Various models are compared and Stepwise Regression model is implemented which resulted in 5.7% Test MSE.
In this project I applied various classification models such as Logistic Regression, Random Forest and LightGBM to detect consumers who will default the loan. SMOTE is used to combat class imbalance and LightGBM is implemented that resulted into the highest accuracy 98.89% and 0.99 F1 Score.
-
- Genetic Algorithm : In this file, I have implemented simple genetic algorithm that finds out the list of numbers which equal to any specified number when summed together.
- Bayesian Statistics : In this file, I explored how bayesian statistics works and how prior assumption reflects posterior probabilities using Gun control example.
- Gaussian Mixture Model and Expectation Maximization: In this file, I implemented Expectation Maximization algorithm to find out true distribution of one dimensional GMM of 2 gaussians.
- Linear Regression: In this file, I aim to solve linear regression using analytical method and also by implementing gradient descent, stochastic gradient descent and minibatch gradient descent algorithms.
- Neural Network Implementation: In this file, I implemented simple neural network using forward propogation, backword propogation and optimization functions to predict the customer churn.
-
- SQL Challenges: This repository contains codes of online SQL challenges (From Hackerrank, Leetcode, Testdome, etc.) solved by me.
- Data Science Challenges: This repository contains codes of online Data Science challenges (From Hackerrank, TestDome, etc.) solved by me.
-
- Ranking of NFL teams using Markov-chain methods : In this project I implemented and compared three stationary distribution of Markov-chain based approaches to rank 32 NFL (National Football League) teams from "Best" to "Worst" using the scores of 2007 NFL regular season.
- Ranking of Tennis players : Objective of this project is to rank all Tennis Players based on the matches they played in the year of 2018. This project comprises 4 approaches to rank Tennis players and I have tried to make these approaches more robust sequentially.
- Methodologies: Machine Learning, Deep Learning, Time Series Analysis, Natural Language Processing, Statistics, Explainable AI, A/B Testing and Experimentation Design, Big Data Analytics
- Languages: Python (Pandas, Numpy, Scikit-Learn, Scipy, Keras, Matplotlib), R (Dplyr, Tidyr, Caret, Ggplot2), SQL, C++
- Tools: MySQL, Tableau, Git, PySpark, Amazon Web Services (AWS), Flask, MS Excel
- Tableau Essential Training By Linkedin
- Machine Learning Explainability By Kaggle
- Apache PySpark Training By Linkedin
- SQL Essential Training By Linkedin
- SQL Test By HackerRank
- SQL Test By Testdome
- Data Science Test By Testdome
- Deep Learning Specialization By deeplearning.ai
- Big Data 101 By Cognitiveclass.ai
- Google Analytics for Begineers By Google