Apache Spark in the Cloud Certificate for Siew Yee Tai
Certificate ID:
620383
Authentication Code:
ab170
Certified Person Name:
Siew Yee Tai
Trainer Name:
CHEE SENG LU
Duration Days:
3
Duration Hours:
21
Course Name:
Apache Spark in the Cloud
Course Date:
28 September 2020 09:00 to 7 October 2020 18:00
Venue:
Kuala Lumpur
Course Outline:
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration