Hadoop Training

₹ 19000

₹ 9999

Overview

  • Lectures 30
  • Quizzes 1
  • Duration 60 hours
  • Skill level Beginner
  • Language English
  • Assessments Yes
Learning Outcomes & Flexibilities
  • Over 18 modules and 55.5 hours of content!
  • Learn from Industry Experts.
  • Gain Job-ready Skills.
  • Attend Live Instructor-led Sessions.
  • Get help to create a world-class resume, promote your profile, gain salary negotiation skills, and have mock interview sessions
  • Life time access to Recorded Sessions
  • Get personalized learning experience with your mentor, who will track your progress and provide insights.
  • Job assistance and interview scheduling.
  • Certification after completion of Course.

Curriculum

  • 1.HADOOP INSTALLATION AND SETUP Overview
    • 1.1 The architecture of Hadoop cluster
    • 1.2 What is high availability and federation?
    • 1.3 How to setup a production cluster?
    • 1.4 Various shell commands in Hadoop
    • 1.5 Understanding configuration files in Hadoop
    • 1.6 Installing a single node cluster with Cloudera Manager
    • 1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume
  • 2. . Introduction to Big Data Hadoop and Understanding HDFS and MapReduce
    • 2.1 Introducing Big Data and Hadoop
    • 2.2 What is Big Data, and where does Hadoop fit in?
    • 2.3Two important Hadoop ecosystem components, namely, MapReduce and HDFS
    • 2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager
  • 3. DEEP DIVE IN MAPREDUCE
    • 3.1 Learning the working mechanism of MapReduce .
    • 3.2 Understanding the mapping and reducing stages in MR
    • 3.3 Various terminology in MR such as input format, output format, partitioners, combiners, shuffle, and sort
  • 4. INTRODUCTION TO HIVE
    • 4.1 Introducing Hadoop Hive.
    • 4.2 Detailed architecture of Hive
    • 4.3 Comparing Hive with Pig and RDBMS
    • 4.4 Working with Hive Query Language
    • 4.5 Creation of a database, table, group by, and other clauses
    • 4.6 Various types of Hive tables and HCatalog
    • 4.7 Storing Hive results, Hive partitioning, and buckets
  • 5. ADVANCED HIVE AND IMPALA
    • 5.1 Indexing in Hive
    • 5.2 The map-side join in Hive
    • 5.3 Working with complex data types
    • 5.4 The Hive user-defined functions
    • 5.5 Introduction to Impala
    • 5.6 Comparing Hive with Impala
    • 5.7 The detailed architecture of Impala
  • 6. INTRODUCTION TO PIG
    • 6.1 Apache Pig introduction and its various features .
    • 6.2 Various data types and schema in Pig
    • 6.3 The available functions in Pig, Hive bags, tuples, and fields
  • 7. FLUME, SQOOP, AND HBASE
    • 7.1 Apache Sqoop introduction
    • 7.2 Importing and exporting data
    • 7.3 Performance improvement with Sqoop
    • 7.4 Sqoop limitations
    • 7.5 Introduction to Flume and understanding the architecture of Flume
    • 7.6 What are HBase and the CAP theorem?
  • 8. WRITING SPARK APPLICATIONS USING SCALA
    • 8.1 Using Scala for writing Apache Spark applications
    • 8.2 Detailed study of Scala
    • 8.3 The need for Scala
    • 8.4 The concept of object-oriented programming
    • 8.5 Executing the Scala code
    • 8.6 Various classes in Scala such as getters, setters, constructors, abstract, extending objects, and overriding methods
    • 8.7 The Java and Scala interoperability
    • 8.8 The concept of functional programming and anonymous functions
    • 8.9 Bobsrockets package and comparing the mutable and immutable collections
    • 8.10 Scala REPL, lazy values, control structures in Scala, directed acyclic graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, and Spark in Hadoop ecosystem
  • 9. SPARK FRAMEWORK
    • 9.1 Detailed Apache Spark and its various features
    • 9.2 Comparing with Hadoop
    • 9.3 Various Spark components
    • 9.4 Combining HDFS with Spark and Scalding
    • 9.5 Introduction to Scala
    • 9.6 Importance of Scala and RDDs Hands-on Exercise: The resilient distributed dataset (RDD) in Spark, How does it help speed up Big Data processing?
  • 10. RDDS IN SPARK
    • 10.1 Understanding Spark RDD operations
    • 10.2 Comparison of Spark with MapReduce
    • 10.3 What is a Spark transformation?
    • 10.4 Loading data in Spark
    • 10.5 Types of RDD operations, viz. transformation and action
    • 10.6 What is a Key/Value pair?
  • 11. DATAFRAMES AND SPARK SQL
    • 11.1 The detailed Spark SQL
    • 11.2 The significance of SQL in Spark for working with structured data
    • 11.3 Spark SQL JSON support
    • 11.4 Working with XML data and parquet files
    • 11.5 Creating Hive Context
    • 11.6 Writing a DataFrame to Hive
    • 11.7 How to read a JDBC file?
    • 11.8 Significance of a Spark DataFrames
    • 11.9 How to create a DataFrame?
    • 11.10 What is schema manual inferring?
    • 11.11 Working with CSV files, JDBC table reading, data conversion from a DataFrame to JDBC, Spark SQL user-defined functions, shared variable, and accumulators
    • 11.12 How to query and transform data in DataFrames?
    • 11.13 How a DataFrame provides the benefits of both Spark RDDs and Spark SQL
    • 11.14 Deploying Hive on Spark as the execution engine
  • 12. MACHINE LEARNING USING SPARK (MLLIB)
    • 12.1 Introduction to Spark MLlib
    • 12.2 Understanding various algorithms
    • 12.3 What is Spark iterative algorithm?
    • 12.4 Spark graph processing analysis
    • 12.5 Introducing Machine Learning
    • 12.6 K-means clustering
    • 12.7 Spark variables like shared and broadcast variables
    • 12.8 What are accumulators?
    • 12.9 Various ML algorithms supported by MLlib
    • 12.10 Linear regression, logistic regression, decision tree, random forest, and kmeans clustering techniques
  • 13. INTEGRATING APACHE FLUME AND APACHE KAFKA
    • 13.1 Why and what is Kafka?
    • 13.2 Kafka architecture and workflow
    • 13.3 Configuring Kafka cluster
    • 13.4 Basic operations
    • 13.5 Kafka monitoring tools
    • 13.6 Integrating Apache Flume and Apache Kafka
  • 14. SPARK STREAMING
    • 14.1 Introduction to Spark Streaming
    • 14.2 The architecture of Spark Streaming
    • 14.3 Working with the Spark Streaming program
    • 14.4 Processing data using Spark Streaming
    • 14.5 Requesting count and DStream
    • 14.6 Multi-batch and sliding window operations
    • 14.7 Working with advanced data sources
    • 14.8 Features of Spark Streaming
    • 14.9 Spark Streaming workflow
    • 14.10 Initializing StreamingContext
    • 14.11 Discretized Streams (DStreams)
    • 14.12 Input DStreams and Receivers
    • 14.13 Transformations on DStreams
    • 14.14 Output operations on DStreams
    • 14.15 Windowed operators and its uses
    • 14.16 Important windowed operators and stateful operators
  • 15. HADOOP ADMINISTRATION – MULTI - NODE CLUSTER SETUP USING AMAZON EC2
    • 15.1 Create a 4-node Hadoop cluster setup
    • 15.2 Running the MapReduce Jobs on the Hadoop cluster
    • 15.3 Successfully running the MapReduce code
    • 15.4 Working with the Cloudera Manager setup
  • 16. HADOOP ADMINISTRATION – CLUSTER CONFIGURATION
    • 16.1 Overview of Hadoop configuration
    • 16.2 The importance of Hadoop configuration file
    • 16.3 The various parameters and values of configuration
    • 16.4 HDFS parameters and MapReduce parameters
    • 16.5 Setting up the Hadoop environment
    • 16.6 Include and exclude configuration files
    • 16.7 The administration and maintenance of NameNode, DataNode, directory structures, and files
    • 16.8 What is a File system image?
    • 16.9 Understanding the edit log
  • 17. HADOOP ADMINISTRATION: MAINTENANCE, MONITORING, AND TROUBLESHOOTING
    • 17.1 Introduction to the checkpoint procedure, NameNode failure
    • 17.2 How to ensure the recovery procedure, safe mode, metadata and data backup, various potential problems and solutions, and what to look for and how to add and remove nodes
  • 18. ETL CONNECTIVITY WITH HADOOP ECOSYSTEM (SELF-PACED)
    • 18.1 How do ETL tools work in Big Data industry?
    • 18.2 Introduction to ETL and data warehousing
    • 18.3 Working with prominent use cases of Big Data in the ETL industry
    • 18.4 End-to-end ETL PoC showing Big Data integration with the ETL tool