Cloudera Course


Cloudera Course Overview

Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected.

CDH is the most complete, tested, and popular distribution of Apache Hadoop and related projects. CDH delivers the core elements of Hadoop – scalable storage and distributed computing – along with a Web-based user interface and vital enterprise capabilities. CDH is Apache-licensed open source and isthe only Hadoop solution to offer unified batch processing, interactive SQL and interactive search, and role-based access controls.

Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected. The Cloudera distribution of Apache Hadoop and other related open-source projects, including Impala and Cloudera Search.

Software Engineers, System Analysts, Database Administrators, Devops engineer and System Administrators who want to learn about Big Data Ecosystem with Cloudera.

Linux, Cloud Basics, System Administration will be added advantage. Basic understanding of IT administration or development activities

Completing this course will help you to get in as Data Scientists, Technical Architects, Software Developers, Testing and Hadoop Cloudera Administrator in Major IT companies like ADP, Allstate, AMD, Apollo group, Barclays, box, AOI, blackberry and more

The main concepts covered Hadoop Basic Concepts, Writing a MapReduce Program, Integrating Hadoop into the Workflow, Using Hive and Pig, Common MapReduce Algorithms, the Hadoop API, Joining Data Sets in MapReduce Jobs, creating workflows

Cloudera Course Syllabus

The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands-On Exercise
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

Writing a MapReduce Program

  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API
  • Using Eclipse for Rapid Development

Integrating Hadoop into the Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Creating workflows with Oozie
  • Importing Data from RDBMSs With Sqoop
  • Hands-On Exercise
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

The Hadoop API

  • Using Combiners
  • Using LocalJobRunner Mode for Faster Development
  • Reducing Intermediate Data with Combiners
  • The configure and close methods for MapReduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Directly Accessing HDFS
  • Using The Distributed Cache

Using Hive and Pig

  • Hive Basics
  • Pig Basics

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Machine Learning with Mahout
  • Term Frequency – Inverse Document Frequency
  • Word Co-Occurrence

Practical Development Tips and Techniques

  • Testing with MRUnit
  • Debugging MapReduce Code
  • Using LocalJobRunner Mode for Easier Debugging
  • Eclipse development techniques
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Implementing Multiple Mappers using ChainMapper

More Advanced MapReduce Programming

  • Custom Writable and WritableComparables
  • Saving Binary Data using SequenceFiles and Avro Files
  • Creating InputFormats and OutputFormats

Joining Data Sets in MapReduce Jobs

  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins

Graph Manipulation in Hadoop

  • Introduction to graph techniques
  • Representing Graphs in Hadoop
  • Implementing a sample algorithm: Single Source Shortest Path

Creating Workflows with Oozie

  • The Motivation for Oozie
  • Oozie’s Workflow Definition Format