Certificate Authentication

Apache Spark in the Cloud Certificate for Siew Yee Tai

Add to LinkedIn

Certificate ID: 
620383
Authentication Code: 
ab170
Certified Person Name: 
Siew Yee Tai
Trainer Name: 
CHEE SENG LU
Duration Days: 
3
Duration Hours: 
21
Course Name: 
Apache Spark in the Cloud
Course Date: 
28 September 2020 09:00 to 7 October 2020 18:00
Venue: 
Kuala Lumpur
Course Outline: 

 

Introduction:

  • Apache Spark in Hadoop Ecosystem
  • Short intro for python, scala

Basics (theory):

  • Architecture
  • RDD
  • Transformation and Actions
  • Stage, Task, Dependencies

Using Databricks environment understand the basics (hands-on workshop):

  • Exercises using RDD API
  • Basic action and transformation functions
  • PairRDD
  • Join
  • Caching strategies
  • Exercises using DataFrame API
  • SparkSQL
  • DataFrame: select, filter, group, sort
  • UDF (User Defined Function)
  • Looking into DataSet API
  • Streaming

Using AWS environment understand the deployment (hands-on workshop):

  • Basics of AWS Glue
  • Understand differencies between AWS EMR and AWS Glue
  • Example jobs on both environment
  • Understand pros and cons

Extra:

  • Introduction to Apache Airflow orchestration