Model based techniques to extract information from data
Machine Learning, GraphLab, Numpy, Scipy
Tasks:
- Identify potential applications of machine learning in practice.
- Describe the core differences in analyses enabled by regression, classification, and clustering.
- Select the appropriate machine learning task for a potential application.
- Apply regression, classification, clustering, retrieval, recommender systems, and deep learning.
- Represent data as features to serve as input to machine learning models.
- Assess the model quality in terms of relevant error metrics for each task.
- Utilize a dataset to fit a model to analyze new data.
- Build an end-to-end application that uses machine learning at its core.
- Implement these techniques in Python.
Models:
- Linear Regression
- Regularization: Ridge(L2), Lasso(L1)
- Neareat neighbor and kernel regression
Algorithms:
- Gradient descent
- Coordinate descent
Concepts:
- Loss funcitons
- bias-variance tradeoff
- cross validation
- sparsity
- overfitting
- model selection
- feature selection
Tasks:
- Describe the input and output of a regression model.
- Compare and contrast bias and variance when modeling data.
- Estimate model parameters using optimization algorithms.
- Tune parameters with cross validation.
- Analyze the performance of the model.
- Describe the notion of sparsity and how LASSO leads to sparse solutions.
- Deploy methods to select between models.
- Exploit the model to form predictions.
- Build a regression model to predict prices using a housing dataset.
- Implement these techniques in Python.
Tasks:
- Describe the input and output of a classification model.
- Tackle both binary and multiclass classification problems.
- Implement a logistic regression model for large-scale classification.
- Create a non-linear model using decision trees.
- Improve the performance of any model using boosting.
- Scale methods with stochastic gradient ascent.
- Describe the underlying decision boundaries.
- Build a classification model to predict sentiment in a product review dataset.
- Analyze financial data to predict loan defaults.
- Use techniques for handling missing data.
- Evaluate models using precision-recall metrics.
Models:
- Nearest Neighbors
- Clustering
- Mixture of Gaussians
- Latent Dirichlet Allocation
Algorithms:
- KD-trees
- Locality sensitve hashing
- K-means
- MapReduce
- Expectation Maximization
- Gibbs sampling
Core ML:
- Distance Metrics
- Approximation algorithms
- Unsupervised learning
- Probabilistic modeling
- Data Parallel problems
- Bayesian inference
Tasks:
- Create a document retrieval system using k-nearest neighbors.
- Identify various similarity metrics for text data.
- Reduce computations in k-nearest neighbor search by using KD-trees.
- Produce approximate nearest neighbors using locality sensitive hashing.
- Compare and contrast supervised and unsupervised learning tasks.
- Cluster documents by topic using k-means.
- Describe how to parallelize k-means using MapReduce.
- Examine probabilistic clustering approaches using mixtures models.
- Fit a mixture of Gaussian model using expectation maximization (EM).
- Perform mixed membership modeling using latent Dirichlet allocation (LDA).
- Describe the steps of a Gibbs sampler and how to use its output to draw inferences.
- Compare and contrast initialization techniques for non-convex optimization objectives.
- Implement these techniques in Python.