Skip to content

Latest commit

 

History

History
98 lines (82 loc) · 4.56 KB

204-Data-Analytics.org

File metadata and controls

98 lines (82 loc) · 4.56 KB

<<<CP1204>>> DATA ANALYTICS

{{{credits}}}

LTPC
3003

Course Objectives

  • To understand the competitive advantages of data analytics.
  • To understand data frameworks.
  • To learn data analysis methods.
  • To learn stream computing.
  • To gain knowledge on Hadoop related tools such as HBase, and Hive for data analytics.

{{{unit}}}

Unit IIntroduction to Big Data8

Big Data: Definition – Characteristic features – Big data applications – Big data vs traditional data – Risks of big data – Structure of big data; Web data; Evolution of analytic scalability; Modern Data Analytic Tools: R programming.

{{{unit}}}

Unit IIHadoop Framework9

Distributed File Systems: Large-scale file system organization – HDFS concepts – MapReduce execution – Algorithms using MapReduce.

{{{unit}}}

Unit IIIData Analysis11

Statistical Methods: Regression modelling – Multivariate analysis; Classification: SVM & Kernel methods; Cluster Analysis: Types of data in cluster analysis – Partitioning methods – Hierarchical methods – Density based methods – Model based clustering methods – Clustering high dimensional data.

{{{unit}}}

Unit IVMining Data Streams8

Streams: Concepts – Stream data model and architecture – Sampling data in a stream; Mining Data Streams: Filtering streams – Counting distinct elements in a stream – Estimating moments – Counting oneness in a window – Decaying window; Real Time Analytics Platform (RTAP) Applications; Case Studies: Real time sentiment analysis.

{{{unit}}}

Unit VBig Data Frameworks9

Introduction to NoSQL – MongoDB – Aggregate data models – Hbase: Data model and implementations – Hbase clients; Hive: Data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.

\hfill Total: 45

Course Outcomes

After the completion of this course, students will be able to:

  • Understand how to leverage the insights from big data analytics (K2).
  • Solve applications using statistical and data analytic methods (K3).
  • Perform analytics using streaming data (K3).
  • Develop applications using Hadoop related tools and R Programming (K4).
  • Use database frameworks like MongoDB, Hive and Hbase for data analysis (K3).

Course Outcomes (Batch 2021-2023)

After the completion of this course, students will be able to:

  • Explain how to leverage the insights from big data analytics (K2).
  • Solve applications using statistical and data analytic methods (K3).
  • Develop analytics using streaming data (K3).
  • Develop applications using Hadoop-related tools and R Programming (K3).
  • Develop databases using MongoDB, Hive, and HBase for data analysis (K3).

References

  1. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics”, John Wiley & Sons, 2012.
  2. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, Cambridge University Press, 2012.
  3. Roger D. Peng, “R programming for Data Science”, 5th edition, LeanPub, 2016.
  4. Michael Berthold and David J. Hand, “Intelligent Data Analysis”, Springer, 2007.
  5. Tom White, “Hadoop: The Definitive Guide – Storage and Analysis at Internet Scale”, 4th Edition, O’Reilly, 2015.
  6. E. Capriolo, D. Wampler and J. Rutherglen, “Programming Hive”, O’Reilly, 2012.
  7. Lars George, “HBase: The Definitive Guide”, O’Reilly, 2011.
  8. P. J. Sadalage and M. Fowler, “NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence”, Addison - Wesley Professional, 2012.
  9. Kristina Chodorow, “MongoDB: The Definitive Guide – Powerful and Scalable Data Storage”, 2nd Edition, O’Reilly, 2013.

CO PO MAPPING

PO1PO2PO3PO4PO5PO6PO7PO8PO9PO10PO11
K3K6K6K6K6
CO1K2211
CO2K332222
CO3K3322
CO4K432222
CO5K332222
Total149966
Course Mapping32222