Bespoke Python and Spark for Big Data Certificate for Zoe Slade
Certificate ID:
685475
Authentication Code:
16f7d
Certified Person Name:
Zoe Slade
Trainer Name:
Gunnar Bless
Duration Days:
3
Duration Hours:
21
Course Name:
Bespoke Python and Spark for Big Data
Course Date:
25 July 2022 09:30 to 8 August 2022 16:30
Course Outline:
Introduction
Big Data
- Concepts
- Overview of Hadoop
- Overview of Spark
- Overview of PySpark
Setup
- Setting Up PySpark
- Using Amazon Web Services (AWS) EC2 Instances for Spark
- Setting Up Databricks
- Setting Up the AWS EMR Cluster
PySpark - Batch Processing
- PySpark - Intro: Working with RDDs
- PySpark - Intro: Working with Dataframes
- PySpark - Structured API
- PySpark - Datatypes and Operations
- Pyspark - Aggregations
- PySpark - Joins
- PySpark - SQL
- PySpark - DataSources
- PySpark/Spark - Operations & Performance
- Spark UI
- Spark History Server
- Some remarks regarding other performance monitoring tools and augmentations
- Some Koalas Remarks
PySpark - Streaming
- DStreams
- Structured Streaming
PySpark - MLlib
- SkLearn vs MLLib
- Supervised Learning
- Linear Regressions
- Decision Trees & Random Forest
- Other examples
- Unsupervised Learning
- Clustering
- PCA
- Recommender Systems
Closing Remarks