Skip to content

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Notifications You must be signed in to change notification settings

Pushkr/Apache-Spark-Hands-On

Folders and files

NameName
Last commit message
Last commit date

Latest commit

7a5acde · Jan 22, 2019
Mar 15, 2017
Nov 14, 2016
Aug 30, 2016
Oct 4, 2016
Aug 26, 2018
Nov 11, 2016
Oct 1, 2016
Oct 23, 2016
Jan 22, 2019
Nov 9, 2018
Oct 1, 2016
Oct 6, 2016
Oct 27, 2016
Dec 14, 2016
Oct 15, 2016
Aug 29, 2016
Sep 27, 2016
Sep 19, 2016
Mar 30, 2017
Oct 15, 2018

Repository files navigation

For the benefit of community, Please feel free to add/request anything that hasnt been covered. Please remember this is beginners guide and not a expert level documentation.

Hadoop

  • /Flume : contains notes and examples of apache flume
  • /Hive : contains notes and examples of apache hive
  • /MySQL : code sample containing peices to create db, create table and load data in mysql
  • /Sqoop : contains notes and examples of import/export using sqoop
  • /spark : contains notes,documentation, sample example(s) of spark APIs

Hands-on :

  • /exam : sample cca-175 exam questions and solutions (in solution branch)
  • /problem1 - complex data structure handling using hive. (exposure to Hive,create table,LOAD,named_struct,struct)
  • /problem2 - Stock data analysis. (exposure to : json file handing, SparkSQL,map,reduce,filter,join,groupByKey,keyBy,UDFs etc)
  • /problem3 - MovieLens database analysis
  • /problem4 - Lahman's baseball database analysis
  • /problem5 - Hortonworks certification sample. Total 10 tasks .
  • /Tweeter - Tweeter data analysis
  • /problem6 - Retail database sample excercises

My Answers to few PySpark Questions on StackOverFlow : Link