HADOOP BIG DATA
About This Course
Course Overview :
Big-Data and Hadoop
-
- Introduction to big data and Hadoop
- Hadoop Architecture
- Installing Ubuntu with Java 1.8 on VM Workstation 11
- Hadoop Versioning and Configuration
- Single Node Hadoop 1.2.1 installation on Ubuntu 14.4.1
- Multi Node Hadoop 1.2.1 installation on Ubuntu 14.4.1
- Linux commands and Hadoop commands
- Cluster architecture and block placement
- Modes in Hadoop
- Local Mode
- Pseudo Distributed Mode
- Fully Distributed Mode
- Hadoop Daemon
- Master Daemons(Name Node, Secondary Name Node, Job Tracker)
- Slave Daemons(Job tracker, Task tracker)
- Task Instance
- Hadoop HDFS Commands
- Accessing HDFS
- CLI Approach
- Java Approach
Map-Reduce
-
- Understanding Map Reduce Framework
- Inspiration to Word-Count Example
- Developing Map-Reduce Program using Eclipse Luna
- HDFS Read-Write Process
- Map-Reduce Life Cycle Method
- Serialization(Java)
- Datatypes
- Comparator and Comparable(Java)
- Custom Output File
- Analysing Temperature dataset using Map-Reduce
- Custom Partitioner & Combiner
- Running Map-Reduce in Local and Pseudo Distributed Mode.
Advanced Map-Reduce
-
- Enum(Java)
- Custom and Dynamic Counters
- Running Map-Reduce in Multi-node Hadoop Cluster
- Custom Writable
- Site Data Distribution
- Using Configuration
- Using DistributedCache
- Using stringifie
- Input Formatters
- NLine Input Formatter
- XML Input Formatter
- Sorting
- Reverse Sorting
- Secondary Sorting
- Compression Technique
- Working with Sequence File Format
- Working with AVRO File Format
- Testing MapReduce with MR Unit
- Working with NYSE DataSets
- Working with Million Song DataSets
- Running Map-Reduce in Cloudera Box
HIVE
-
- Hive Introduction & Installation
- Data Types in Hive
- Commands in Hive
- ExploringInternal and External Table
- Partitions
- Complex data types
- UDF in Hive
- Built-in UDF
- Custom UDF
- Thrift Server
- Java to Hive Connection
- Joins in Hive
- Working with HWI
- Bucket Map-side Join
- More commands
- View
- SortBy
- Distribute By
- Lateral View
- Running Hive in Cloudera
SQOOP
-
- Sqoop Installations and Basics
- Importing Data from Oracle to HDFS
- Advance Imports
- Real Time UseCase
- Exporting Data from HDFS to Oracle
- Running Sqoop in Cloudera
PIG
-
- Installation and Introduction
- WordCount in Pig
- NYSE in Pig
- Working With Complex Datatypes
- Pig Schema
- Miscellaneous Command
- Group
- Filter
- Order
- Distinct
- Join
- Flatten
- Co-group
- Union
- Illustrate
- Explain
- UDFs in Pig
- Parameter Substitution and DryRun
- Pig Macros
- Running Pig in Cloudera
OOZIE
- Installing Oozie
- Running Map-Reduce with Oozie
- Running Pig and Sqoop with Oozie