Skip to content

Latest commit

 

History

History
89 lines (75 loc) · 3.24 KB

353-Big-Data-Analytics.org

File metadata and controls

89 lines (75 loc) · 3.24 KB

<<<CP1353>>> INTRODUCTION BIG DATA ANALYTICS

{{{credits}}}

LTPC
2023

Course Objectives

  • To understand the competitive advantages of data analytics
  • To understand the data frameworks
  • To learn data analysis methods
  • To gain knowledge on Hadoop related tools such as HBase, and Hive for data analytics

{{{unit}}}

Unit IIntroduction to Big Data8

Big Data: Definition – Characteristic features – Big data applications – Big data vs Traditional data – Risks of big data – Structure of big data; Web data; Evolution of analytic scalability; Modern Data Analytic Tools.

{{{unit}}}

Unit IIHadoop Framework10

Distributed File Systems: Large-scale file system organization – HDFS concepts – MapReduce execution – Algorithms using MapReduce.

{{{unit}}}

Unit IIIData Analysis9

Statistical Methods: Regression modelling – Multivariate analysis; Classification: SVM & Kernel methods – Decision Trees; Linear Classifiers

{{{unit}}}

Unit IVNoSQL9

Introduction to NoSQL – Characteristics of NoSQL – NoSQL Storage Types – Advantages and Drawbacks.

{{{unit}}}

Unit VBig Data Frameworks9

MongoDB – Aggregate data models – Hbase: Data model and implementations – Hbase clients; Hive: Data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries

Suggestive Experiments

Hadoop

  1. Applications using Map-Reduce programming (Examples: word count / frequency programs / matrix multiplication)

R

  1. Linear and logistic Regression (Loan prediction using Credit approval dataset, Sales prediction using Bigmart dataset)
  2. SVM / Decision tree classification techniques (Flower type classification based on available attributes using Iris dataset, Passengers survival classification using titanic dataset)
  3. Clustering (Document categorization by multiclass techniques)
  4. Visualize data using any plotting framework

Database

  1. Application that stores data in Hbase / MongoDB (Sentiment analysis using twitter dataset)

\hfill Total: 60

Course Outcomes

After the completion of this course, students will be able to:

  • Understand how to leverage the insights from big data analytics (K2)
  • Solve applications using statistical and data analytic methods (K3)
  • Develop applications using Hadoop related tools(K4)
  • Use database frameworks like MongoDB, Hive and Hbase for data analysis(K3)

References

  1. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics”, John Wiley & sons, 2012.
  2. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, 2007.
  3. Tom White, “Hadoop: The Definitive Guide - Storage and Analysis at Internet Scale”, 4th Edition, O’Reilly, 2015.
  4. Gaurav Vaish, ”Getting Started with NoSQL”, Packt Publishing Ltd, 2013.
  5. E. Capriolo, D. Wampler, and J. Rutherglen, “Programming Hive”, O’Reilly, 2012.
  6. Lars George, “HBase: The Definitive Guide”, O’Reilly, 2011.
  7. Kristina Chodorow, “MongoDB: The Definitive Guide – Powerful and Scalable Data Storage”, 2nd Edition, O’Reilly, 2013.