AxeorHome

Training Coverage

We cover all aspects needed for Big Data projects in our training programs.

Big Data

  • The problem space and example applications
  • Why don’t traditional approaches scale?
  • Requirements

Hadoop Background

  • Hadoop History
  • The ecosystem and stack: HDFS, MapReduce, Hive, Pig…
  • Cluster architecture overview

Development Environment

  • Hadoop distribution and basic commands
  • Eclipse development

HDFS Introduction

  • The HDFS command line and web interfaces
  • The HDFS Java API (lab)

MapReduce Introduction

  • Key philosophy: move computation, not data
  • Core concepts: Mappers, reducers, drivers
  • The MapReduce Java API (lab)

Real-World MapReduce

  • Optimizing with Combiners and Partitioners (lab)
  • More common algorithms: sorting, indexing and searching (lab)
  • Relational manipulation: map-side and reduce-side joins (lab)
  • Chaining Jobs
  • Testing with MRUnit

Higher-level Tools

  • Patterns to abstract “thinking in MapReduce”
  • The Cascading library (lab)
  • The Hive database (lab)