1. Getting Started with Data Mining
- Introducing data mining
- A simple affinity analysis example
- What is affinity analysis?
- Product recommendations
- Implementing a simple ranking of rules
- Support
- Confidence
- Ranking to find the best rules
- A simple classification example
- What is classification?
- Loading and preparing the dataset
- Implementing the OneR algorithm
- The algorithm
- Testing the algorithm
- The rule
2. Classifying with scikit-learn Estimators
- scikit-learn estimators
- Nearest neighbors
- Distance metrics
- Loading the dataset
- Moving towards a standard workflow
- Running the algorithm
- Setting parameters
- Preprocessing using pipelines
- An example
- Standard preprocessing
- Putting it all together
- Pipelines
3. Predicting Sports Winners with Decision Trees
- Loading the dataset
- Collecting the data
- Cleaning up the dataset
- Extracting new features
- Decision trees
- Parameters in decision trees
- Using decision trees
- Glossary for expanded standings
- Extra: Model Training Using GridSearch
- Random forests
- How do ensembles work?
- Parameters in Random forests
- Applying Random forests
- Engineering new features (a guide)
4. Recommending Movies Using Affinity Analysis
- Affinity analysis
- Algorithms for affinity analysis
- Choosing parameters
- The movie recommendation problem
- Obtaining the dataset
- Sparse data formats
- The Apriori implementation
- The Apriori algorithm
- Implementation
- Extracting association rules
- Evaluation
5. Extracting Features with Transformers
- Feature extraction
- Representing reality in models
- Common feature patterns
- Creating good features
- Feature selection
- Selecting the best individual features
- Feature creation
- Remove mixed data types in some columns (a simple approach)
- Principal Component Analysis
- Creating your own transformer
- The transformer API
- Implementation
- Unit testing
- Putting it all together
6. Social Media Insight Using Naive Bayes
- Disambiguation
- Downloading data from a social network
- Loading and classifying the dataset
- Loading data without the Twitter API
- Creating a replicable dataset from Twitter
- Text transformers
- Bag-of-words
- N-grams
- Other features (further reading)
- Naive Bayes
- Bayes' theorem
- A simple example
- Naive Bayes algorithm
- How it works
- Application
- Extracting word counts
- Converting dictionaries to a matrix
- Training the Naive Bayes classifier
- Putting it all together
- Evaluation using the F1-score
- Getting useful features from models
7. Discovering Accounts to Follow Using Graph Mining
- Creating a graph & building the network
- Creating a similarity graph
- Finding subgraphs
- Connected components
- Optimizing criteria
8. Beating CAPTCHAs with Neural Networks
- Artificial neural networks
- An introduction to neural networks
- Creating the dataset
- Splitting the image into individual letters
- Creating a training dataset
- Adjusting our training dataset to our methodology
- Training and classifying
- Backpropagation
- Predicting words
- Possibly improving accuracy using a dictionary
- Ranking mechanisms for words
- Putting it all together
9. Authorship Attribution
- Authorship Attribution
- Attributing documents to authors
- Applications and use cases
- Attributing authorship
- Getting the data
- Downloading all the files
- Function words
- Counting function words
- Classifying with function words
- Support vector machines
- Classifying with SVMs
- Kernels
- Character n-grams
- Extracting character n-grams
10. Clustering News Articles
- Generate news articles
- Create articles with indicators
- Grouping news articles
- The k-means algorithm
- Evaluating the results
- Extracting topic information from clusters
- Using clustering algorithms as transformers
- Clustering ensembles
- Evidence accumulation
- How it works
- Implementation
- Online Learning
- An introduction to online learning
- Implementation
11. Classifying Objects in Images Using Deep Learning
- Object classification
- Application scenario and goals
- Use cases
- Deep neural networks
- Intuition
- Implementation
- Building a Simple Convolutional Neural Network with Keras
- GPU optimization
- When to use GPUs for computation