Skip to content

Latest commit

 

History

History
95 lines (63 loc) · 1.51 KB

machine_learning.rst

File metadata and controls

95 lines (63 loc) · 1.51 KB

Machine Learning

Author: Hilary Mason <[email protected]> @hmason
  1. ML History
  • Eniac
  • Turing Test
  • Eliza
  • AI Winter
  • jmseigler? SexBot (except not)
  • Add stats in the 90's (revitalizes AI)
  1. Clustering
  • Start with K-means
  • Entity disambiguation
  1. Topic Model
  • R has a topic module
  • Hilary has Python code
  1. Recommendations
  • Based on existing data of users with similar interests
  • Amazon
  • Netflix
  1. Classification
  • Train the classifiers
  • Bayesian
    • Spam Filter
    • Facial Recognition
  1. Dirty Hacks
  • Good sources of training data
    • Wikipedia
    • NY Times
  • lynx --dump <url>
  1. How to approach
  • Obtain
  • Scrub
  • Explore
  • Model
  • iNterpret
  1. Build a Model
  2. Probability Theory
  • Area is 1
  • P(A or B) = P(A) + P(B) - P(A and B)
  • Bayes Law
  1. Twitter
  • Sports down, Math up
  • Python using NLTK
  • On GitHub
  1. Feature Selection
  • Easy for humans, but not statistically feasible
  • Think about what's interesting about the data.
  • (Twitter) N-grams, people, presence of link, etc.
  1. Bit.ly
  • Actual a hard problem
  • Size indicators
  • Billions or trillions of data points
  • In memory DB of everything within the last hour
  • Velocity, half-life, prediction
  • Location mining
    • Cultural analysis based on when & where people are clicking
  • Collaborative Filtering
  • Tom Mitchell
  • Data Mining (Purple Cover)
  • Email for resources
  • WordNet
  • Research benefits of combining models