Certificate Authentication

Big Data Certificate for HOREA-PETRU MUSTEA

Add to LinkedIn

Certificate ID: 
759475
Authentication Code: 
4d647
Certified Person Name: 
HOREA-PETRU MUSTEA
Trainer Name: 
Anna Kotarba
Duration Days: 
3
Duration Hours: 
21
Course Name: 
Big Data
Course Date: 
3 April 2024 10:00 to 5 April 2024 17:00
Course Outline: 

Working with data using DataFrames and Datasets in PySpark.

Advanced data manipulations with PySpark SQL.

Building ETL (Extract, Transform, Load) pipelines with PySpark.

Data source connectors and file formats in PySpark.

Error handling and data validation in ETL.

Advanced data processing techniques, including window functions and UDFs.

Performance optimization strategies in PySpark.

Introduction to the DataBricks platform and environment setup.

Working with DataBricks notebooks and collaboration tools.

Cloud data storage - AWS Data Lake overview.

Setting up AWS S3 as a data lake and integrating it with PySpark.

Cloud data storage - Introduction to Azure Synapse Analytics.

Azure Blob Storage for data storage and integration with PySpark.

Running PySpark jobs on cloud platforms.

Scaling and optimizing performance in the cloud.

Cost optimization strategies for cloud data analytics.