- [en] Stanford lectures on Probability Theory: link
- [en] Matrix calculus notes from Stanford: link
- [en] Derivatives notes from Stanford: link
- [en] The Hundred-page Machine Learning book: link (available online, e.g. on the github)
- [ru] Отличные лекции Жени Соколова. Читать pdf, лучше всего наиболее актуальный год: link
- [en] Naive Bayesian classifier explained: link
- [en] Stanford notes on linear models: link
- [ru] “Рукописный учебник” от студентов нашего курса на ФИВТе: link
- [ru] Методичка Воронцова, link
- [ru] Замечательная книжка В.Г. Спокойного про линейные оценки: link
- [en] Detailed description of bootstrap procedure: link
- [en] Bias-variance tradeoff in more general case: A Unified Bias-Variance Decomposition and its Applications link
- [en] Great interactive blogpost by Alex Rogozhnikov on Gradient Boosting: http://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
- [en] And great gradient boosted trees playground by Alex Rogozhnikov: http://arogozhnikov.github.io/2016/07/05/gradient_boosting_playground.html
- [en] Shap values repo and explanation: https://github.com/slundberg/shap
- [en] Kaggle tutorial on feature importances: https://www.kaggle.com/learn/machine-learning-explainability
- [en] Deep Learning book.
Classical. Delivers comprehensive overview of almost all vital themes in ML and DL. Available online at https://www.deeplearningbook.org - [en] Notes on vector and matrix derivatives: http://cs231n.stanford.edu/vecDerivs.pdf
- [en] More notes on matrix derivatives from Stanford: link
- [en] Stanford notes on backpropagation: http://cs231n.github.io/optimization-2/
- [en] Stanford notes on different activation functions (and just intuition): http://cs231n.github.io/neural-networks-1/
- [en] Great post on Medium by Andrej Karpathy: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
- [en] CS231n notes on data preparation (batch normalization over there): http://cs231n.github.io/neural-networks-2/
- [en] CS231n notes on gradient methods: http://cs231n.github.io/neural-networks-3/
- [en] Original paper introducing Batch Normalization: https://arxiv.org/pdf/1502.03167.pdf
- [en] What Every Computer Scientist Should Know About Floating-Point Arithmetic: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
- [en] The Unreasonable Effectiveness of Recurrent Neural Networks blog post by Andrej Karpathy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- [en] Understanding LSTM Networks: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- [en] CS231n notes on data preparation: http://cs231n.github.io/neural-networks-2/
- [en] Convolutional Neural Networks: Architectures, Convolution / Pooling Layers: http://cs231n.github.io/convolutional-networks/
- [en] Understanding and Visualizing Convolutional Neural Networks: http://cs231n.github.io/understanding-cnn/
- [en] LR warm-up and useful tricks - article
- [en] Great resource by Lena Voita (direct link to Word Embeddings explanation): https://lena-voita.github.io/nlp_course/word_embeddings.html
- [en] Word2vec tutorial: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
- [en] Beautiful post by Jay Alammar on word2vec: http://jalammar.github.io/illustrated-word2vec/
- [en] Blog post about text classification with RNNs and CNNs blogpost: https://medium.com/jatana/report-on-text-classification-using-cnn-rnn-han-f0e887214d5f
- [en] Convolutional Neural Networks for Sentence Classification: https://arxiv.org/abs/1408.5882
- [en] Great blog post by Jay Alammar on Transformer: https://jalammar.github.io/illustrated-transformer/
- Notebook on positional encoding: link
- [en] Great Annotated Transformer article with code and comments by Harvard NLP group: https://nlp.seas.harvard.edu/2018/04/03/attention.html
- [en] Harvard NLP full Transformer implementation in PyTorch
- [en] OpenAI blog post Better Language Models and Their Implications (GPT-2)
- [en] Paper describing positional encoding "Convolutional Sequence to Sequence Learning"
- [en] Paper presenting Layer Normalization
- [en] The Illustrated BERT blog post
- [en] DistillBERT overview (distillation will be covered later in our course) blog post
- [en] Google AI Blog post about open sourcing BERT
- [en] OpenAI blog post Better Language Models and Their Implications (GPT-2)
- [en] One more blog post explaining BERT
- [en] Post about GPT-2 in OpenAI blog (by 04.10.2019)
- [en] Introduction to Graph Neural Networks
- [en] Grear repo with must-read papers on GNN
- [en] Reinforcement Learning: An introduction by Richard S. Sutton and Andrew G. Barto: link