Skip to content

beauvilerobed/data-mining-with-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notes on Data Mining

1. Getting Started with Data Mining
  • Introducing data mining
  • A simple affinity analysis example
  • What is affinity analysis?
  • Product recommendations
  • Implementing a simple ranking of rules
  • Support
  • Confidence
  • Ranking to find the best rules
  • A simple classification example
  • What is classification?
  • Loading and preparing the dataset
  • Implementing the OneR algorithm
  • The algorithm
  • Testing the algorithm
  • The rule
2. Classifying with scikit-learn Estimators
  • scikit-learn estimators
  • Nearest neighbors
  • Distance metrics
  • Loading the dataset
  • Moving towards a standard workflow
  • Running the algorithm
  • Setting parameters
  • Preprocessing using pipelines
  • An example
  • Standard preprocessing
  • Putting it all together
  • Pipelines
3. Predicting Sports Winners with Decision Trees
  • Loading the dataset
  • Collecting the data
  • Cleaning up the dataset
  • Extracting new features
  • Decision trees
  • Parameters in decision trees
  • Using decision trees
  • Glossary for expanded standings
  • Extra: Model Training Using GridSearch
  • Random forests
  • How do ensembles work?
  • Parameters in Random forests
  • Applying Random forests
  • Engineering new features (a guide)
4. Recommending Movies Using Affinity Analysis
  • Affinity analysis
  • Algorithms for affinity analysis
  • Choosing parameters
  • The movie recommendation problem
  • Obtaining the dataset
  • Sparse data formats
  • The Apriori implementation
  • The Apriori algorithm
  • Implementation
  • Extracting association rules
  • Evaluation
5. Extracting Features with Transformers
  • Feature extraction
  • Representing reality in models
  • Common feature patterns
  • Creating good features
  • Feature selection
  • Selecting the best individual features
  • Feature creation
  • Remove mixed data types in some columns (a simple approach)
  • Principal Component Analysis
  • Creating your own transformer
  • The transformer API
  • Implementation
  • Unit testing
  • Putting it all together
6. Social Media Insight Using Naive Bayes
  • Disambiguation
  • Downloading data from a social network
  • Loading and classifying the dataset
  • Loading data without the Twitter API
  • Creating a replicable dataset from Twitter
  • Text transformers
  • Bag-of-words
  • N-grams
  • Other features (further reading)
  • Naive Bayes
  • Bayes' theorem
  • A simple example
  • Naive Bayes algorithm
  • How it works
  • Application
  • Extracting word counts
  • Converting dictionaries to a matrix
  • Training the Naive Bayes classifier
  • Putting it all together
  • Evaluation using the F1-score
  • Getting useful features from models
7. Discovering Accounts to Follow Using Graph Mining
  • Creating a graph & building the network
  • Creating a similarity graph
  • Finding subgraphs
  • Connected components
  • Optimizing criteria
8. Beating CAPTCHAs with Neural Networks
  • Artificial neural networks
  • An introduction to neural networks
  • Creating the dataset
  • Splitting the image into individual letters
  • Creating a training dataset
  • Adjusting our training dataset to our methodology
  • Training and classifying
  • Backpropagation
  • Predicting words
  • Possibly improving accuracy using a dictionary
  • Ranking mechanisms for words
  • Putting it all together
9. Authorship Attribution
  • Authorship Attribution
  • Attributing documents to authors
  • Applications and use cases
  • Attributing authorship
  • Getting the data
  • Downloading all the files
  • Function words
  • Counting function words
  • Classifying with function words
  • Support vector machines
  • Classifying with SVMs
  • Kernels
  • Character n-grams
  • Extracting character n-grams
10. Clustering News Articles
  • Generate news articles
  • Create articles with indicators
  • Grouping news articles
  • The k-means algorithm
  • Evaluating the results
  • Extracting topic information from clusters
  • Using clustering algorithms as transformers
  • Clustering ensembles
  • Evidence accumulation
  • How it works
  • Implementation
  • Online Learning
  • An introduction to online learning
  • Implementation
11. Classifying Objects in Images Using Deep Learning
  • Object classification
  • Application scenario and goals
  • Use cases
  • Deep neural networks
  • Intuition
  • Implementation
  • Building a Simple Convolutional Neural Network with Keras
  • GPU optimization
  • When to use GPUs for computation

About

Repo for academic and self-learning purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published