Sebastian Raschka, 2015
Python Machine Learning
A list of references as they appear throughout the chapters.
A BibTeX version for your favorite reference manager is available here.
- F. Galton. Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, pages 246–263, 1886.
-
Python: https://www.python.org
-
Installing Python: https://docs.python.org/3/installing/index.html
-
Anaconda Scientific Python Distribution: https://store.continuum.io/cshop/anaconda/
-
W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.
-
F. Rosenblatt. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory, 1957.
-
B. Widrow. Adaptive ”Adaline” neuron using chemical ”memistors”. Number Technical Report 1553-2. Stanford Electron. Labs., Stanford, CA, October 1960.
-
NumPy Tutorial: http://wiki.scipy.org/Tentative_NumPy_Tutorial
-
Pandas Tutorial: http://pandas.pydata.org/pandas-docs/stable/tutorials.html
-
Matplotlib Tutorial: http://matplotlib.org/users/beginner.html
-
IPython Notebook: https://ipython.org/ipython-doc/3/notebook/index.html
-
BLAS (Basic Linear Algebra Subprograms): http://www.netlib.org/blas/
-
LAPACK — Linear Algebra PACKage: http://www.netlib.org/lapack/
-
UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
-
Iris dataset: https://archive.ics.uci.edu/ml/datasets/Iris
- Z. Kolter. Linear algebra review and reference, 2008.
-
D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. Evolutionary Computation, IEEE Transactions on, 1(1):67–82, 1997.
-
D. H. Wolpert. The supervised learning no-free-lunch theorems. In Soft Computing and Industry, pages 25–42. Springer, 2002.
-
S. Menard. Logistic regression: From introductory to advanced concepts and applications. Sage Publica- tions, 2009.
-
V. Vapnik. The nature of statistical learning theory. Springer Science & Business Media, 2013.
-
C. J. Burges. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2):121–167, 1998.
-
J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3):209–226, 1977.
-
scikit-learn: http://scikit-learn.org/stable/
-
LIBLINEAR -- A Library for Large Linear Classification: http://www.csie.ntu.edu.tw/~cjlin/liblinear/
-
LIBSVM -- A Library for Support Vector Machines https://www.csie.ntu.edu.tw/~cjlin/libsvm/
-
Graphviz - Graph Visualization Software: http://www.graphviz.org
-
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and regression trees. wadsworth. Belmont, CA, 1984.
-
L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
-
P. Cunningham and S. J. Delany. k-nearest neighbour classifiers. Multiple Classifier Systems, pages 1–17, 2007.
-
T. Hastie, J. Friedman, and R. Tibshirani. The Elements of Statistical Learning, volume 2. Springer, 2009. Section 3.4.
-
F. Ferri, P. Pudil, M. Hatef, and J. Kittler. Comparative study of techniques for large-scale feature selection. Pattern Recognition in Practice IV, pages 403–413, 1994.
- Wine Data Set: https://archive.ics.uci.edu/ml/datasets/Wine
-
One-hot encoding: https://en.wikipedia.org/wiki/One-hot
-
M. Y. Park and T. Hastie. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659–677, 2007.
-
A. Y. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning, page 78. ACM, 2004.
-
D. W. Aha and R. L. Bankert. A comparative evaluation of sequential feature selection algorithms. In Learning from Data, pages 199–206. Springer, 1996.
-
C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics, 8(1):25, 2007.
-
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2):179–188, 1936.
-
C. R. Rao. The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society. Series B (Methodological), 10(2):159–203, 1948.
-
A. M. Martinez and A. C. Kak. PCA versus LDA. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(2):228–233, 2001.
-
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. 2nd. Edition. New York, 2001.
-
B. Schoelkopf, A. Smola, and K.-R. Mueller. Kernel principal component analysis. pages 583–588, 1997.
-
I. Jolliffe. Principal component analysis. Wiley Online Library, 2002.
-
Manifold learning algorithms: https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Manifold_learning_algorithms
-
J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge university press, 2004.
-
R. Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145, 1995.
-
M. Markatou, H. Tian, S. Biswas, and G. M. Hripcsak. Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6:1127–1168, 2005.
-
B. Efron and R. Tibshirani. Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association, 92(438):548–560, 1997.
-
S. Varma and R. Simon. Bias in error estimation when using cross-validation for model selection. BMC bioinformatics, 7(1):91, 2006.
- Breast Cancer Wisconsin dataset: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
-
Y. Bengio and Y. Grandvalet. No unbiased estimator of the variance of k-fold cross-validation. The Journal of Machine Learning Research, 5:1089–1105, 2004.
-
S. Raschka. An overview of general performance metrics of binary classifier systems. Computing Research Repository (CoRR), abs/1410.5330, 2014.
-
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.
-
J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240. ACM, 2006.
-
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281–305, 2012.
-
D. H. Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.
-
L. Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
-
R. E. Schapire. The strength of weak learnability. Machine learning, 5(2):197–227, 1990.
-
Y. Freund, R. E. Schapire, et al. Experiments with a new boosting algorithm. In ICML, volume 96, pages 148–156, 1996.
-
L. Breiman. Bias, variance, and arcing classifiers. 1996.
-
G. Raetsch, T. Onoda, and K. R. Mueller. An improvement of adaboost to avoid overfitting. In Proc. of the Int. Conf. on Neural Information Processing. Citeseer, 1998.
-
A. Toescher, M. Jahrer, and R. M. Bell. The bigchaos solution to the netflix grand prize. Netflix prize documentation, 2009.
- Netflix Recommendations: Beyond the 5 stars (Part 1): http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
-
K. M. Ting and I. H. Witten. Issues in stacked generalization. J. Artif. Intell. Res.(JAIR), 10:271–289, 1999.
-
J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.
-
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
-
I. Kanaris, K. Kanaris, I. Houvardas, and E. Stamatatos. Words versus character n-grams for anti-spam filtering. International Journal on Artificial Intelligence Tools, 16(06):1047–1067, 2007.
-
S. Raschka. Naive bayes and text classification I - introduction and theory. Computing Research Repos- itory (CoRR), abs/1410.5329, 2014.
-
S. Bird, E. Klein, and E. Loper. Natural language processing with Python. O’Reilly Media, Inc.”, 2009.
-
M. F. Porter. An algorithm for suffix stripping. Program: electronic library and information systems, 14(3):130–137, 1980.
-
M. Toman, R. Tesar, and K. Jezek. Influence of word normalization on text classification. Proceedings of InSciT, pages 354–358, 2006.
-
A. Appleby. murmurhash3, 2011.
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
-
Review dataset: http://ai.stanford.edu/~amaas/data/sentiment/
-
Google regex Tutorial: https://developers.google.com/edu/python/regular-expressions
-
Natural Language Toolkit: http://www.nltk.org
-
Google Word2Vec: https://code.google.com/p/word2vec/
-
A. Aizawa. An information-theoretic perspective of tf–idf measures. Information Processing & Manage- ment, 39(1):45–65, 2003.
-
M. F. Porter. Snowball: A language for stemming algorithms, 2001.
-
C. D. Paice. Method for evaluation of stemming algorithms based on error counting. Journal of the American Society for Information Science, 47(8):632–649, 1996.
-
Flask: http://flask.pocoo.org
-
SQLite: http://www.sqlite.org
-
SQLite Manager Add-on: https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager/
-
Jinja2: http://jinja.pocoo.org
-
Webapp example: http://raschkas.pythonanywhere.com
-
pythonanywhere: https://www.pythonanywhere.com
- HTTP Methods: GET vs. POST: http://www.w3schools.com/tags/ref_httpmethods.asp
-
A. I. Khuri. Introduction to linear regression analysis, by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining. International Statistical Review, 81(2):318–319, 2013.
-
D. S. G. Pollock. The Classical Linear Regression Model.
-
R. Toldo and A. Fusiello. Automatic estimation of the inlier threshold in robust multiple structures fitting. In Image Analysis and Processing–ICIAP 2009, pages 123–131. Springer, 2009.
-
Housing dataset: https://archive.ics.uci.edu/ml/datasets/Housing
-
J. W. Tukey. Exploratory data analysis. 1977.
-
I. Lawrence and K. Lin. A concordance correlation coefficient to evaluate reproducibility. Biometrics, pages 255–268, 1989.
-
N. J. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika, 78(3):691– 692, 1991.
-
P. Meer, D. Mintz, A. Rosenfeld, and D. Y. Kim. Robust regression methods for computer vision: A review. International journal of computer vision, 6(1):59–70, 1991.
-
A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
-
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
-
A. Liaw and M. Wiener. Classification and regression by randomForest. R news, 2(3):18–22, 2002.
-
G. Louppe. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502, 2014.
-
G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In Advances in Neural Information Processing Systems, pages 431–439, 2013.
-
S. R. Gunn et al. Support vector machines for classification and regression. ISIS technical report, 14, 1998.
-
D. Arthur and S. Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.
-
J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973.
-
J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, 2013.
-
S. Ghosh and S. K. Dubey. Comparative analysis of k-means and fuzzy c-means algorithms. IJACSA, 4:35–38, 2013.
-
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.
-
Z. Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3):283–304, 1998.
-
C. Ding and X. He. K-means clustering via principal component analysis. In Proceedings of the twenty- first international conference on Machine learning, page 29. ACM, 2004.
-
Y. Ding, Y. Zhao, X. Shen, M. Musuvathi, and T. Mytkowicz. Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 579–587, 2015.
-
P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
-
J.-P. Rasson and T. Kubushishi. The gap test: an optimal method for determining the number of natural classes in cluster analysis. In New approaches in classification and data analysis, pages 186–193. Springer, 1994.
-
S. C. Johnson. Hierarchical clustering schemes. Psychometrika, 32(3):241–254, 1967.
-
D. R. G. H. R. Williams and G. Hinton. Learning representations by back-propagating errors. Nature, pages 323–533, 1986.
-
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1701–1708. IEEE, 2014.
-
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al. Deepspeech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.
-
T. Unterthiner, A. Mayr, G. Klambauer, and S. Hochreiter. Toxicity prediction using deep learning. arXiv preprint arXiv:1503.01445, 2015.
-
T. Hastie, J. Friedman, and R. Tibshirani. The Elements of Statistical Learning, volume 2. Springer, 2009.
-
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
-
A. G. Baydin and B. A. Pearlmutter. Automatic differentiation of algorithms for machine learning. arXiv preprint arXiv:1404.7456, 2014.
-
Y. Bengio. Learning deep architectures for AI. Foundations and trends in Machine Learning, 2(1):1–127, 2009.
-
P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In null, page 958. IEEE, 2003.
-
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
-
C. M. Bishop. Neural networks for pattern recognition. Oxford university press, 1995.
-
"How Google Translate squeezes deep learning onto a phone": http://googleresearch.blogspot.com/2015/07/how-google-translate-squeezes-deep.html
-
Article about Endianness: https://en.wikipedia.org/wiki/Endianness
- Automatic differentiation: https://en.wikipedia.org/wiki/Automatic_differentiation
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A cpu and gpu math compiler in python. In Proc. 9th Python in Science Conf, pages 1–7, 2010.
-
LISA Lab: http://lisa.iro.umontreal.ca
-
Keras: http://keras.io
-
Geoff Hinton http://www.cs.toronto.edu/~hinton/,
-
Andrew Ng http://www.andrewng.org
-
Yann LeCun http://yann.lecun.com
-
Juergen Schmidhuber http://people.idsia.ch/~juergen/
-
Yoshua Bengio http://www.iro.umontreal.ca/~bengioy
-
Symbolic Computation: https://en.wikipedia.org/wiki/Symbolic_computation
-
What Every Programmer Should Know About Floating-Point Arithmetic: http://floating-point-gui.de