Hadoop, (Part 2 of 4): ETL and MapReduce
Interactive

Hadoop, (Part 2 of 4): ETL and MapReduce

Biz Library
Updated Feb 04, 2020

In this course, Hadoop expert Kevin McCarty takes a closer look at some of the major components underpinning Hadoop – services such as Mahout, Oozie, and ZooKeeper, and languages such as Pig and Hive. He will examine the Hadoop architecture and look at some ETL tools Hadoop provides for moving data between a Hadoop cluster and external servers. Finally, McCarty will demonstrate a simple application in Java and follow that up with a deep dive into MapReduce including a look at automation using the Linux Chron Utility


Lesson 1:

  • Where Do You Find Big Data?
  • Big Data Sources - Volume
  • Big Data Sources - Variety
  • Structured Data
  • Semi-Structured
  • Unstructured Data
  • Problems with Big Data
  • Data Integrity
  • Data Completeness
  • Data Format
  • Data Timeliness
  • How Do We Process Big Data?
  • What Is ETL? - Extraction
  • What Is ETL? - Transform
  • What Is ETL? - Load.

Lesson 2:

  • In This Exercise...
  • Demo: Sqoop
  • Demo: Working with Tables
  • Demo: ETL.

Lesson 3:

  • What Is MapReduce?
  • History of MapReduce
  • MapReduce - Benefits
  • MapReduce - Limitations
  • Demo: MapReduce
  • Demo: Create a Jar File.

Lesson 4:

  • Demo: MapReduce Setup
  • Demo: Word Count Program.

Lesson 5:

  • Language Support
  • How Streaming Works
  • Creating a MapReduce Application
  • MapReduce - Execution
  • MapReduce - Main
  • MapReduce - The Mapper
  • MapReduce - The Reducer
  • Demo: Create Java File
  • Demo: MapReduce
  • Demo: Map Method
  • Demo: Reduce Function.

Lesson 6:

  • Ad-Hoc vs. Scheduling
  • Cron Jobs
  • Cron Tables
  • Creating a Cron Job
  • Example Cron Job Text
  • Demo: Cron Scheduling.