Python and Spark for Big Data for Banking (PySpark) Certificate for...
Add to LinkedIn
Certificate ID:
801971
Authentication Code:
cc3ce
Certified Person Name:
Aswin Gumilar
Trainer Name:
Bagas Prakasa
Duration Days:
2
Duration Hours:
14
Course Name:
Python and Spark for Big Data for Banking (PySpark)
Course Date:
2025-03-19 09:00 to 2025-03-20 17:00
Course Outline:
Day 1: Data Processing and Python Essentials
Session 1: Spark DataFrames and Basic Operations
- Working with Spark DataFrames Implementing Basic Operations
- Groupby and Aggregate Operations
- Handling Timestamps and Dates
- Hands-on Exercise: Data analysis using Spark DataFrames
Session 2: Python Programming for Big Data
- Core Python for Data Handling Using Variables, Lists, and Functions
- Working with Classes and Files
- Integrating APIs and External Data
- Hands-on Exercise: Building a Python project that processes and analyzes data with PySpark
Day 2: Advanced PySpark and Machine Learning
Session 3: Machine Learning with PySpark
- Implementing Machine Learning with Spark MLlib Linear and Logistic Regression
- Random Forest Classification Models
- Hands-on Exercise: Building and evaluating machine learning models using PySpark
Session 4: Clustering and Recommender Systems
- K-means Clustering Theory and Practical Implementation
- Hands-on Exercise: Building a K-means clustering model
- Recommender Systems Building a recommendation engine with Spark MLlib
- Hands-on Exercise: Recommender system project
Session 5: Spark Streaming and NLP
- Real-Time Data Streaming with Spark Implementing real-time data processing
- Hands-on Exercise: Streaming data with Spark
- Natural Language Processing (NLP) with PySpark Implementing basic NLP tasks
- Hands-on Exercise: NLP pipeline using PySpark