Big Data Certificate for HOREA-PETRU MUSTEA
Working with data using DataFrames and Datasets in PySpark.
Advanced data manipulations with PySpark SQL.
Building ETL (Extract, Transform, Load) pipelines with PySpark.
Data source connectors and file formats in PySpark.
Error handling and data validation in ETL.
Advanced data processing techniques, including window functions and UDFs.
Performance optimization strategies in PySpark.
Introduction to the DataBricks platform and environment setup.
Working with DataBricks notebooks and collaboration tools.
Cloud data storage - AWS Data Lake overview.
Setting up AWS S3 as a data lake and integrating it with PySpark.
Cloud data storage - Introduction to Azure Synapse Analytics.
Azure Blob Storage for data storage and integration with PySpark.
Running PySpark jobs on cloud platforms.
Scaling and optimizing performance in the cloud.
Cost optimization strategies for cloud data analytics.