Certificate Authentication

Bespoke Python and Spark for Big Data Certificate for Zoe Slade

Add to LinkedIn

Certificate ID: 
685475
Authentication Code: 
16f7d
Certified Person Name: 
Zoe Slade
Trainer Name: 
Gunnar Bless
Duration Days: 
3
Duration Hours: 
21
Course Name: 
Bespoke Python and Spark for Big Data
Course Date: 
2022-07-25 09:30 to 2022-08-08 16:30
Course Outline: 

Introduction

Big Data

  • Concepts
  • Overview of Hadoop
  • Overview of Spark
  • Overview of PySpark

Setup

  • Setting Up PySpark
  • Using Amazon Web Services (AWS) EC2 Instances for Spark
    • Setting Up Databricks
    • Setting Up the AWS EMR Cluster

PySpark - Batch Processing

  • PySpark - Intro: Working with RDDs
  • PySpark - Intro: Working with Dataframes
  • PySpark - Structured API
  • PySpark - Datatypes and Operations
  • Pyspark - Aggregations
  • PySpark - Joins
  • PySpark - SQL
  • PySpark - DataSources
  • PySpark/Spark - Operations & Performance
    • Spark UI
    • Spark History Server
    • Some remarks regarding other performance monitoring tools and augmentations
  • Some Koalas Remarks

PySpark - Streaming

  • DStreams
  • Structured Streaming

PySpark - MLlib

  • SkLearn vs MLLib
  • Supervised Learning
    • Linear Regressions
    • Decision Trees & Random Forest
    • Other examples
  • Unsupervised Learning
    • Clustering
    • PCA
  • Recommender Systems

Closing Remarks