Skip to content

Latest commit

 

History

History
110 lines (89 loc) · 4.48 KB

PE206-Foundations-of-Data-Science.org

File metadata and controls

110 lines (89 loc) · 4.48 KB

<<<PE206>>> FOUNDATIONS OF DATA SCIENCE

{{{credits}}}

LTPC
3003

COURSE OBJECTIVES

  • To learn fundamentals of Data Science using Python
  • To understand probability distributions and statistical Inferences
  • To be familar with supervised and unsupervised methods in machine learning
  • To explore the algorithms used for analysing massive data problems and social networks
  • To learn about topic and graphical models.

{{{unit}}}

UNIT IDATA SCIENCE AND PYTHON9

Introduction: Computational tools – Need for data science – Causality and experiments; Array Computing in Python: Vectors – Arrays – Advanced vectorization of functions – Higher-dimensional Arrays: Matrices and arrays; Dictionaries and Strings.

{{{unit}}}

UNIT IIPROBABILITY AND STATISTICS9

Randomness – Empirical Distributions – Testing Hypothesis – Estimation – Why the mean matters – Prediction – Inference for Regression.

{{{unit}}}

UNIT IIIMACHINE LEARNING9

Perceptron algorithm – Kernel functions – Overfitting and uniform convergence – Regularization – Support Vector Machines – Strong and weak learning – Stochastic Gradient Descent.

{{{unit}}}

UNIT IVDATA STREAMS AND CLUSTERING9

Algorithms for Massive Data Problems: Frequency moments of data streams – Matrix algorithms using sampling; Clustering: k-Means clustering – Spectral clustering – Community finding and graph partitioning.

{{{unit}}}

UNIT VTOPIC MODELS AND GRAPHICAL MODELS9

Topic Models – Nonnegative matrix factorization – Latent Dirichlet allocation – Hidden Markov models – Bayesian Belief Networks – Markov Random Fields.

\hfill Total Periods: 45

COURSE OUTCOMES

After the completion of this course, students will be able to:

  • Develop Python programs to perform analysis on data (K3)
  • Understand various probability distributions and statistical inferences (K2)
  • Develop applications to demonstrate machine learning algorithms in practice (K3)
  • Understand the principles of handling data streams (K2)
  • Discuss topic and graphical modeling techniques in real world problem (K2).

TEXT BOOKS

  1. Ani Adhikari, John DeNero, “Computational and Inferential Thinking: The Foundations of Data Science”, GitBook, 2017. (Unit- I, II)
  2. Avrim Blum, John Hopcroft, Ravindran Kannan, “Foundations of Data Science”, Vorabversion eines Lehrbuchs, 2016. (Unit-III, IV, V)

REFERENCES

  1. Hans Petter Langtangen, “A Primer on Scientific Programming with Python”, 4th Edition, Springer, 2016. (Unit - I).
  2. Jonathan Dinu, “Foundations of Data Science: A Practical Introduction to Data Science with Python”, Addison-wesley Data & Analytics Series, 2016.
  3. Jure Leskovek, Anand Rajaraman, Jeffrey Ullman, “Mining of Massive Datasets”, V2.1, Cambridge University Press, 2014.
  4. EMC Education Services, “Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data”, Wiley publishers, 2015.
  5. Cathy O’Neil, Rachel Schutt. “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014.

CO PO MAPPING

PO1PO2PO3PO4PO5PO6PO7PO8PO9PO10PO11PO12PSO1PSO2PSO3
K3K6K6K6K6-------K6K5K6
CO1K3322
CO2K2211
CO3K33222
CO4K2211
CO5K2211
Score12727
Course Mapping3222