HADOOP ADMIN
About This Course
COURSE DESCRIPTION
With the advancement in technology and internet facilities, large sets of data are being created every day, this enabling new challenges and opportunities for businesses to maintain data and work on them. To maintain these large sets of data many companies use Hadoop Administration commands. So we designed the Hadoop Admin Course which provides the hands-on Knowledge in managing, configuring, installing the Apache Hadoop platform with load balancing and security to operate and maintain Hadoop Cluster. Our Hadoop Administration Training uses lots of real-time challenges to make understand the concepts effectively.
Course Curriculum
MODULE 1: LINUX BASICS
- Linux Basic Command Like cat,cd,cp,mv,find, Autosys etc
- Importance of bash_profile for setting user environment variables.
- Linux User Management and permissions. Useradd, groupadd, usermod, userdel, chown, chmod.
- Monitoring Resource Usage using TOP, SAR, VMSTAT etc
- How to run Jobs in the background using nohup
- Job scheduling using crontab
- Linux tools like FTP, sftp, scp, repository files, yum repository, yum install – explanation
MODULE 2: INTRODUCTION
- Gen2 Architecture
- Relational databases
- Data Types
- Different Tools
- Pseudo-distributed and Fully Distributed Mode (Lab)
MODULE 3: HDFS ARCHITECTURE
- Understanding HDFS layer and architecture, Name Node, data node, Node failures, HDFS commands.
- Replication
- BlockSize, Block Storage
- Setting Block Size and Replication Factor
- Understanding Image Viewer
- Understanding Edits Viewer
- HDFS Snapshots
- Understanding and Implementing HDFS Federation
- Understanding viewFS and webHDFS
- Permissions and Quotas
- HttpFS gateway usage
MODULE 4: MapReduce FRAMEWORK
- Overview of MapReduce
- Understanding MapReduce
- The Map Phase
- The Reduce Phase
- WordCount in MapReduce
- Running MapReduce Job
MODULE 5: PLANNING HADOOP CLUSTER
- Single/Multimode cluster configuration
- Decide your Cluster Size
- Overview of Hardware and other Network configurations
- Network Topology
- Overview of Cluster Management
MODULE 6: COMMON ADMINISTRATION TASKS
- Adding Datanodes
- Decommissioning Datanodes
- Rebalancing the cluster
- Cluster Upgrading
- Performance Tuning Parameters
- Mount HDFS to a local file system using the NFS Gateway
- Understanding the usage of Logs
- Backup and Copying Data between clusters using Distcp
- Common Failures
MODULE7: INSTALLING AND MANAGING HADOOP COMPONENTS SETUP/CONFIGURATION
- Sqoop
- Flume
- Hive setup, hclient, hive shell
- Pig
- HBase setup, HBase shell
- Oozie setup
MODULE 8: ADVANCED CLUSTER CONFIGURATION FEATURES
- Hadoop configuration overview and important configuration file
- Configuration parameters and values
- HDFS parameters MapReduce parameters
- Include’ and ‘Exclude’ configuration files
- Security
- Troubleshooting
MODULE 9: YARN ARCHITECTURE
- Understanding YARN components
- YARN Architecture
- Implementing YARN in existing architecture
- Understanding Scheduling in YARN
- Understanding and Implementing Fair Scheduler
- Understanding and Implementing Capacity Scheduler
- Resource Manager HA
MODULE 10: ZOOKEEPER ADMINISTRATION AND HDFS NN HA
- Understanding Zookeeper and its role in HDFS NN HA
- Setting up Zookeeper Cluster
- Introducing Zookeeper CLI
- High Availability with QJM
MODULE 11: HDP SECURE
- Ingesting data from DB using Sqoop
- Secure an HDP cluster using Ambari
- Setup a Knox gateway