Instructor: Renato Rocha Souza
This is the repository of code for the "Introduction to Data Science"
This class is about the Data Science process, in which we seek to gain useful predictions and insights from data. Through real-world examples and code snippets, we introduce methods for:
- data munging, scraping, sampling andcleaning in order to get an informative, manageable data set;
- data storage and management in order to be able to access data (even if big data);
- exploratory data analysis (EDA) to generate hypotheses and intuition about the data;
- prediction based on statistical learning tools;
- communication of results through visualization, stories, and interpretable summaries
Detailed Syllabus:
-
Related Courses cs109, cs229, ML Andrew Ng, free courses
-
Books ref1
-
Data Science Concepts ref1, ref2, ref3, ref4, book1, book2, book3
-
Model Selection ref1
- Feature Engineering ref1, ref2, book
- Automated Feature Engineering featuretools
- Feature Selection ref1, ref2, ref3
- Hiperparameter Search ref1
- Cross Validation ref1, video1
- Oversampling and Undersampling ref1
- Regularization ref1, ref2
- Bias and Variance ref1
- Overfitting and Underfitting ref1, ref2
- Evaluation Metrics and Explainability ref1, ref2, ref3, ref4, ref5, ref6, ref7
- Feature Engineering ref1, ref2, book
-
Machine Learning Algorithms ref1, ref2, ref3, ref4, ref5, ref6, ref7
- Unsupervised ref1
- Supervised
-
Linear Models ref1
-
Bayesian Models
-
k Nearest Neighbors (kNN) ref1
-
Neural Networks and Deep Learning ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10, simple implementation, book, viz, video, meme
- Deep Learning and NLP ref1
-
Neural Network concepts
- General Math
- Linear and dense layers
- Weight Initialization ref1, ref2
- Weight Averaging ref1
- Hyperparameter Tuning ref1, ref2
- Gradient Descent ref1, ref2, ref3, video
- Backpropagation ref1
- Loss Functions ref1
- Convolutional Neural Networks ref1, ref2, ref3, ref4, ref5, ref6, ref6, ref7, ref8, ref9, ref10, architectures, viz
- RNNs (Sequence Models) ref1, ref2, ref3
- Attention Models ref1
- LSTMs and GRUs ref1, ref2, ref3, ref4, ref5, ref6
- Word Embeedings ref1, ref2, ref3, ref4, ref5, ref6, ref6
- Word2vec ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref, video, en-us trained models, pt-br trained models, ge-de trained models, pre-trained models
- Char2vec ref1
- Sentence Embeddings ref1, ref2,
- Doc2vec ref1, ref2, ref3
- Beyond Word Embeddings ref1, ref2, ref3, ref4, ref5, ref6, ref7
- Glove ref1, ref2, ref3
- FastText ref1, ref2
- Misspelling Oblivious Word Embeddings (MOE) ref1, ref2
- Transformers ref1, ref2, ref3, ref4, ref5
- Reinforcement Learning ref1, ref2, ref3, ref4, ref5, ref6, programming resource
- Transfer Learning ref1, ref2, ref3, ref4, ref5, ref6, ref6, ref7
- Autoencoders ref1, ref2, ref3, ref4, ref5
- Generative Adversarial Networks ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, GANS and Deepfakes, Colab Notebooks
-
Data Science Tasks
-
NLP tasks ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10
- Text Classification ref1, ref2, model interpretability, pretrained models, other pretrained models
- Vector Representation ref1, ref2a, ref2b, ref3, ref4
- OCR ref1, ref2
- Topic Modeling ref1, ref2, ref3, ref4, ref5, ref5
- Text Mining and Information Extraction ref1, ref2, ref3, ref4, ref5
- Keyword Extraction and Text Summarization ref1, ref2, ref3, ref4, ref5, ref6, ref7
- Collocation Extraction ref1
- Text Generation ref1
- Regular Expressions ref1, ref2 , ref3
- Named Entity Recognition ref1, ref2, ref3, ref4, ref5, ref6, ref7
- Coreference Resolution ref1,
- Document and Sentence Similarity ref1, ref2, ref3, ref4
- Sentiment Analysis ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, ref10
- Sarcasm Detection ref1
- Chatbots ref1, ref2
- Labeling ref1, ref2
-
Graphs and Network Analysis ref1, ref2, ref3, ref4, ref5, ref7
-
Time Series Analysis ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8, ref9, video
-
Recommender Systems ref1, ref2, ref3, ref4, ref5, ref6, ref7, ref8
-
Music Classification ref1
-
-
Preparing the Environment
-
Versioning Tools
-
Exploratory Data Analysis Tools
-
Machine Learning Tools
-
NLP Tools
-
Visualization Tools ref1
-
Graph Analysis Tools
-
Dashboards and UIs
-
Neural Networks visualization
-
Relational databases and SQL
-
NoSQL / Graph Databases
-
Data Wrangling and Distributed computing
-
Analytical Pipelines
-
Machine Learning Datasets ref1
We are using https://git-lfs.github.com because the /datasets files can be large. Install it before the git clone.