Certificate Authentication

Hadoop for Developpers and Administrators Certificate for Sunsick Yoon

Add to LinkedIn

Certificate ID: 
515981
Authentication Code: 
510f3
Certified Person Name: 
Sunsick Yoon
Trainer Name: 
Fulvio Caruna
Duration Days: 
3
Duration Hours: 
21
Course Name: 
Hadoop for Developpers and Administrators
Course Date: 
7 December 2016 09:30 to 9 December 2016 16:30
Course Outline: 
Module 1. Introduction to Hadoop
The Hadoop Distributed File System (HDFS)
The Read Path and The Write Path
Managing Filesystem Metadata
The Namenode and the Datanode
The Namenode High Availability
Namenode Federation
The Command-Line Tools
Understanding REST Support

Module 2. Introduction to MapReduce
Analyzing the Data with Hadoop 
Map and Reduce Pattern
Java MapReduce 
Scaling Out 
Data Flow 
Developing Combiner Functions
Running a Distributed MapReduce Job 

Module 3. Planning a Hadoop Cluster
Picking a Distribution and Version of Hadoop 
Versions and Features
Hardware Selection
Master and Worker Hardware Selection 
Cluster Sizing
Operating System Selection and Preparation
Deployment Layout
Setting up Users, Groups, and Privileges
Disk Configuration
Network Design 

Module 4. Installation and Configuration
Installing Hadoop
Configuration: An Overview 
The Hadoop XML Configuration Files 
Environment Variables and Shell Scripts 
Logging Configuration 
Managing HDFS 
Optimization and Tuning 
Formatting the Namenode 
Creating a /tmp Directory 
Thinking Namenode High Availability 
The Fencing Options 
Automatic Failover Configuration 
Format and Bootstrap the Namenodes 
Namenode Federation 

Module 5. Understanding Hadoop I/O
Data Integrity in HDFS   
Understanding Codecs 
Compression and Input Splits 
Using Compression in MapReduce 
The Serialization mechanism
File-Based Data Structures 
The SequenceFile format
Other File Formats and Column-Oriented Formats

Module 6. Developing a MapReduce Application
The Configuration API  
Setting Up the Development Environment 
Managing Configuration
GenericOptionsParser, Tool, and ToolRunner
Writing a Unit Test with MRUnit
The Mapper and Reducer 
Running Locally on Test Data  
Testing the Driver 
Running on a Cluster 
Packaging and Launching a Job 
The MapReduce Web UI 
Tuning a Job

Module 7. Identity, Authentication, and Authorization
Managing Identity
Kerberos and Hadoop 
Understanding Authorization

Module 8. Resource Management
What Is Resource Management? 
HDFS Quotas 
MapReduce Schedulers 
Anatomy of a YARN Application Run 
Resource Requests 
Application Lifespan 
YARN Compared to MapReduce 1 
Scheduling in YARN 
Scheduler Options 
Capacity Scheduler Configuration 
Fair Scheduler Configuration 
Delay Scheduling 
Dominant Resource Fairness 

Module 9. MapReduce Types and Formats
MapReduce Types 
The Default MapReduce Job 
Defining the Input Formats 
Managing Input Splits and Records 
Text Input and Binary Input 
Managing Multiple Inputs 
Database Input (and Output) 
Output Formats 
Text Output and Binary Output
Managing Multiple Outputs 
The Database Output

Module 10. Using MapReduce Features
Using Counters 
Reading Built-in Counters 
User-Defined Java Counters 
Understanding Sorting 
Using the Distributed Cache 

Module 11. Cluster Maintenance and Troubleshooting
Managing Hadoop Processes 
Starting and Stopping Processes with Init Scripts 
Starting and Stopping Processes Manually 
HDFS Maintenance Tasks 
Adding a Datanode 
Decommissioning a Datanode 
Checking Filesystem Integrity with fsck 
Balancing HDFS Block Data 
Dealing with a Failed Disk 
MapReduce Maintenance Tasks  
Killing a MapReduce Job 
Killing a MapReduce Task 
Managing Resource Exhaustion 

Module 12. Monitoring
The available Hadoop Metrics 
The role of SNMP 
Health Monitoring 
Host-Level Checks 
HDFS Checks 
MapReduce Checks

Module 13. Backup and Recovery
Data Backup 
Distributed Copy (distcp) 
Parallel Data Ingestion
Namenode Metadata