- Data Analytics 개요 및 주요 개념
- 데이터과학 프로젝트 절차
- Machine Learning 방법론
- Machine Learning 모델링 예시: PP 기사 분류 모형
- [Slide], [Video 1], [Video_2], [Video 3], [Video_4]
- Logistic Regression Formulation
- Logistic Regression 학습: Gradient Descent
- 다항 로지스틱 회귀분석
- [Slide], [Video 1], [Video 2], [Video 3], [Video 4]
- 회귀 모형의 성능 평가: MAE, MAPE, MSE, RMSE
- 분류 모형의 성능 평가: 단순정확도, 균형정확도, F1-지표
- 심층신경망 개요
- 합성곱 신경망: Convolution 개념, 대표적 CNN 구조
- [Slide]
- 순환신경망, LSTM, GRU
- 오토인코더
- [Slide]
- 앙상블 배경
- 배깅 & 랜덤 포레스트
- AdaBoost & Gradient Boosting Machine
- [Slide], [Video 1], [Video 2], [Video 3], [Video 4], [Video 5]
- 이상치 탐지
- 밀도 기반 이상치 탐지
- 모델 기반 이상치 탐지
- [Slide]
- 군집화 개요 및 타당성 평가 지표
- K-평균 군집화
- 계층적 군집화
- 밀도 기반 군집화: DBSCAN
- [Slide], [Video 1], [Video 2], [Video 3], [Video 4]
Topic 1: Introduction to Text Analytics [Slide]
- Text Analytics: Backgrounds, Applications, & Challanges, and Process [Video]
- Text Analytics Process [Video]
Topic 2: Text Preprocessing [Slide]
- Introduction to Natural Language Processing (NLP) [Video]
- Lexical analysis [Video]
- Syntax analysis & Other topics in NLP [Video]
- Reading materials
- Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57. (PDF)
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493-2537. (PDF)
- Young, T., Hazarika, D., Poria, S., & Cambria, E. (2017). Recent trends in deep learning based natural language processing. arXiv preprint arXiv:1708.02709. (PDF)
- NLP Year in Review - 2019 (Medium Post)
Topic 3: Text Representation I: Classic Methods [Slide]
- Bag of words, Word weighting, N-grams [Video]
Topic 5: Text Representation II: Distributed Representation [Slide]
- Neural Network Language Model (NNLM) [Video]
- Word2Vec [Video]
- GloVe [Video]
- FastText, Doc2Vec, and Other Embeddings [Video]
- Reading materials
- Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155. (PDF)
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. (PDF)
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). (PDF)
- Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543). (PDF)
- Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606. (PDF)
Topic 6: Dimensionality Reduction [Slide]
- Dimensionality Reduction Overview, Supervised Feature Selection [Video]
- Unsupervised Feature Extraction [Video]
- Reading materials
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391-407. (PDF)
- Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3), 259-284. (PDF)
- Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(Nov), 2579-2605. (PDF) (Homepage)
- Sequence-to-Sequence Learning [Video]
- Transformer [Video]
- ELMo: Embeddings from Language Models [Video]
- GPT: Generative Pre-Training of a Language Model [Video]
- BERT: Bidirectional Encoder Representations from Transformer [Video]
- GPT-2: Language Models are Unsupervised Multitask Learners [Video]
- Transformer to T5 [Slide], [Video], Presented by Yukyoung Lee.
- Reading Materials
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112). (PDF)
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. (PDF)
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). (PDF)
- Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365. (PDF)
- Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. (PDF)
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. (PDF)
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9. (PDF)
- Topic modeling overview & Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis: pLSA [Video]
- LDA: Document Generation Process [Video]
- LDA Inference: Collapsed Gibbs Sampling, LDA Evaluation [Video]
- Reading Materials
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391. (PDF)
- Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology, 38(1), 188-230.
- Hofmann, T. (1999, July). Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 289-296). Morgan Kaufmann Publishers Inc. (PDF)
- Hofmann, T. (2017, August). Probabilistic latent semantic indexing. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 211-218). ACM.
- Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (PDF)
- Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022. (PDF)
- Recommended video lectures
- LDA by D. Blei (Lecture Video)
- Variational Inference for LDA by D. Blei (Lecture Video)