Coursera 강좌 중 Heather Miller 교수님의 "Big Data Analysis with Scala and Spark"를 수강하며 학습한 내용 정리
- Data-Parallel to Distributed Data-Parallel
- Latency
- RDDs, Sparks's Distributed Collection
- RDDs: Transformation and Actions
- Evaluation in Spark: Unlike Scala Collections!
- Cluster Topology Matters!
- Weekly Summary
- Shuffling: What it is and why it's important
- Partitioning
- Optimizing with Partitioners
- Wide vs Narrow Dependencies
- Weekly Summary
- (작성중)Scala basic
- Scala vs Python in Apache Spark
- Set up Scala
- Scala and Spark Version
- Apache Zeppelin
- Use SBT
- Jupyter Notebook
- https://www.coursera.org/learn/scala-spark-big-data/home/welcome
- 도서 "스파크를 활용한 빅데이터 분석", 비제이퍼블릭, 모하마드 굴러