Hadoop Course

 

Hadoop Course Overview

Hadoop is a large-scale distributed batch processing infrastructure.  Hadoop is also designed to efficiently distribute large amounts of work across a set of machines. Hadoop includes a distributed file system which breaks up input data and sends fractions of the original data to several machines in your cluster to hold. This results in the problem being processed in parallel using all of the machines in the cluster and computes output results as efficiently as possible.

In Hadoop there are different types of modules to handle data in integrated systems. They are Hadoop Distributed File System (HDFSTM), Hadoop YARN, Hadoop MapReduce, and Hadoop Common.

One way to define big data is data that is too big to be processed by relational database management systems (RDBMS). Hadoop helps overcome RDBMS limitations, so big data can be processed.

 

HADOOP is a framework used to develop data processing applications which are executed in a distributed computing environment.

Hadoop’s distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have. The open-source framework is free and uses commodity hardware to store large quantities of data.

This course mainly focuses big data Analysts, Hadoop Developers, Administrators, Analysts and Testers

Individuals must possess Basic database knowledge and programming

With oracle SQL skills all the major IT companies like Google, Facebook, Monster, Amazon, and Bank of America can hire you as developer, application programmer, administrator, database consultants.

This tutorials cover Hadoop Eco Systems, The Hadoop Java API for MapReduce, Hive Overview, Pig Overview, Sqoop Overview, Flume Overview, Moving the Data from Web server Into Hadoop, Apache Hadoop Installation, Monitoring the Hadoop Cluster, Hadoop Configuration management Tool.

Hadoop Course Syllabus

Introduction to Hadoop

  • Hadoop Distributed File System
  • Hadoop Architecture
  • MapReduce & HDFS

Hadoop Eco Systems

  • Introduction to Pig
  • Introduction to Hive
  • Introduction to HBase
  • Other eco system Map

Hadoop Developer

  • Moving the Data into Hadoop
  • Moving The Data out from Hadoop
  • Reading and Writing the files in HDFS using java program

The Hadoop Java API for MapReduce

  • Mapper Class
  • Reducer Class
  • Driver Class
  • Writing Basic MapReduce Program In java
  • Understanding the MapReduce Internal Components
  • Hbase MapReduce Program

Hive Overview

  • Working with Hive

Pig Overview

  • Working with Pig

Sqoop Overview

  • Moving the Data from RDBMS to Hadoop
  • Moving the Data from RDBMS to Hbase
  • Moving the Data from RDBMS to Hive

Flume Overview

Moving The Data from Web server Into Hadoop

  • Real Time Example in Hadoop
  • Apache Log viewer Analysis
  • Market Basket Algorithms

 

HADOOP ADMIN TRAINING

Introduction

  • Big Data Overview
  • Introduction In Hadoop and Hadoop Related Eco System.
  • Choosing Hardware For Hadoop Cluster nodes

Apache Hadoop Installation

  • Standalone Mode
  • Pseudo Distributed Mode
  • Fully Distributed Mode

Installing Hadoop Eco System and Integrate With Hadoop

  • Zookeeper Installation
  • Hbase Installation
  • Hive Installation
  • Pig Installation
  • Sqoop Installation
  • Installing Mahout
  • Horton Works Installation
  • Cloudera Installation
  • Hadoop Commands usage
  • Import the data in HDFS
  • Sample Hadoop Examples

Monitoring the Hadoop Cluster

  • Monitoring Hadoop Cluster with Ganglia
  • Monitoring Hadoop Cluster with Nagios
  • Monitoring Hadoop Cluster with JMX

 Hadoop Configuration management Tool

 Hadoop Benchmarking