Author: | Hilary Mason <[email protected]> @hmason |
---|
- ML History
- Eniac
- Turing Test
- Eliza
- AI Winter
- jmseigler? SexBot (except not)
- Add stats in the 90's (revitalizes AI)
- Clustering
- Start with K-means
- Entity disambiguation
- Topic Model
- R has a topic module
- Hilary has Python code
- Recommendations
- Based on existing data of users with similar interests
- Amazon
- Netflix
- Classification
- Train the classifiers
- Bayesian
- Spam Filter
- Facial Recognition
- Dirty Hacks
- Good sources of training data
- Wikipedia
- NY Times
- lynx --dump <url>
- How to approach
- Obtain
- Scrub
- Explore
- Model
- iNterpret
- Build a Model
- Probability Theory
- Area is 1
- P(A or B) = P(A) + P(B) - P(A and B)
- Bayes Law
- Sports down, Math up
- Python using NLTK
- On GitHub
- Feature Selection
- Easy for humans, but not statistically feasible
- Think about what's interesting about the data.
- (Twitter) N-grams, people, presence of link, etc.
- Bit.ly
- Actual a hard problem
- Size indicators
- Billions or trillions of data points
- In memory DB of everything within the last hour
- Velocity, half-life, prediction
- Location mining
- Cultural analysis based on when & where people are clicking
- Collaborative Filtering
- Tom Mitchell
- Data Mining (Purple Cover)
- Email for resources
- WordNet
- Research benefits of combining models