L | T | P | C |
2 | 0 | 2 | 3 |
- To understand the competitive advantages of data analytics
- To understand the data frameworks
- To learn data analysis methods
- To gain knowledge on Hadoop related tools such as HBase, and Hive for data analytics
Unit I | Introduction to Big Data | 8 |
Big Data: Definition – Characteristic features – Big data applications – Big data vs Traditional data – Risks of big data – Structure of big data; Web data; Evolution of analytic scalability; Modern Data Analytic Tools.
Unit II | Hadoop Framework | 10 |
Distributed File Systems: Large-scale file system organization – HDFS concepts – MapReduce execution – Algorithms using MapReduce.
Unit III | Data Analysis | 9 |
Statistical Methods: Regression modelling – Multivariate analysis; Classification: SVM & Kernel methods – Decision Trees; Linear Classifiers
Unit IV | NoSQL | 9 |
Introduction to NoSQL – Characteristics of NoSQL – NoSQL Storage Types – Advantages and Drawbacks.
Unit V | Big Data Frameworks | 9 |
MongoDB – Aggregate data models – Hbase: Data model and implementations – Hbase clients; Hive: Data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries
- Applications using Map-Reduce programming (Examples: word count / frequency programs / matrix multiplication)
- Linear and logistic Regression (Loan prediction using Credit approval dataset, Sales prediction using Bigmart dataset)
- SVM / Decision tree classification techniques (Flower type classification based on available attributes using Iris dataset, Passengers survival classification using titanic dataset)
- Clustering (Document categorization by multiclass techniques)
- Visualize data using any plotting framework
- Application that stores data in Hbase / MongoDB (Sentiment analysis using twitter dataset)
\hfill Total: 60
After the completion of this course, students will be able to:
- Understand how to leverage the insights from big data analytics (K2)
- Solve applications using statistical and data analytic methods (K3)
- Develop applications using Hadoop related tools(K4)
- Use database frameworks like MongoDB, Hive and Hbase for data analysis(K3)
- Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics”, John Wiley & sons, 2012.
- Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, 2007.
- Tom White, “Hadoop: The Definitive Guide - Storage and Analysis at Internet Scale”, 4th Edition, O’Reilly, 2015.
- Gaurav Vaish, ”Getting Started with NoSQL”, Packt Publishing Ltd, 2013.
- E. Capriolo, D. Wampler, and J. Rutherglen, “Programming Hive”, O’Reilly, 2012.
- Lars George, “HBase: The Definitive Guide”, O’Reilly, 2011.
- Kristina Chodorow, “MongoDB: The Definitive Guide – Powerful and Scalable Data Storage”, 2nd Edition, O’Reilly, 2013.