IdealTrainings: The Best IT Training Company in India

~~₹ 19000~~

₹ 9999

Hadoop Training Program

Ideal Training's Hadoop curated by industry experts and it covers in-depth knowledge on all Hadoop tools. Through this online industry led hadoop certification training you will real life industry cases.

Overview

Lectures 30
Quizzes 1
Duration 60 hours
Skill level Beginner
Language English
Assessments Yes

Learning Outcomes & Flexibilities

Over 18 modules and 55.5 hours of content!
Learn from Industry Experts.
Gain Job-ready Skills.
Attend Live Instructor-led Sessions.
Get help to create a world-class resume, promote your profile, gain salary negotiation skills, and have mock interview sessions
Life time access to Recorded Sessions
Get personalized learning experience with your mentor, who will track your progress and provide insights.
Job assistance and interview scheduling.
Certification after completion of Course.

Curriculum

1.HADOOP INSTALLATION AND SETUP Overview
- 1.1 The architecture of Hadoop cluster
- 1.2 What is high availability and federation?
- 1.3 How to setup a production cluster?
- 1.4 Various shell commands in Hadoop
- 1.5 Understanding configuration files in Hadoop
- 1.6 Installing a single node cluster with Cloudera Manager
- 1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume
2. . Introduction to Big Data Hadoop and Understanding HDFS and MapReduce
- 2.1 Introducing Big Data and Hadoop
- 2.2 What is Big Data, and where does Hadoop fit in?
- 2.3Two important Hadoop ecosystem components, namely, MapReduce and HDFS
- 2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager
3. DEEP DIVE IN MAPREDUCE
- 3.1 Learning the working mechanism of MapReduce .
- 3.2 Understanding the mapping and reducing stages in MR
- 3.3 Various terminology in MR such as input format, output format, partitioners, combiners, shuffle, and sort
4. INTRODUCTION TO HIVE
- 4.1 Introducing Hadoop Hive.
- 4.2 Detailed architecture of Hive
- 4.3 Comparing Hive with Pig and RDBMS
- 4.4 Working with Hive Query Language
- 4.5 Creation of a database, table, group by, and other clauses
- 4.6 Various types of Hive tables and HCatalog
- 4.7 Storing Hive results, Hive partitioning, and buckets
5. ADVANCED HIVE AND IMPALA
- 5.1 Indexing in Hive
- 5.2 The map-side join in Hive
- 5.3 Working with complex data types
- 5.4 The Hive user-defined functions
- 5.5 Introduction to Impala
- 5.6 Comparing Hive with Impala
- 5.7 The detailed architecture of Impala
6. INTRODUCTION TO PIG
- 6.1 Apache Pig introduction and its various features .
- 6.2 Various data types and schema in Pig
- 6.3 The available functions in Pig, Hive bags, tuples, and fields
7. FLUME, SQOOP, AND HBASE
- 7.1 Apache Sqoop introduction
- 7.2 Importing and exporting data
- 7.3 Performance improvement with Sqoop
- 7.4 Sqoop limitations
- 7.5 Introduction to Flume and understanding the architecture of Flume
- 7.6 What are HBase and the CAP theorem?
8. WRITING SPARK APPLICATIONS USING SCALA
- 8.1 Using Scala for writing Apache Spark applications
- 8.2 Detailed study of Scala
- 8.3 The need for Scala
- 8.4 The concept of object-oriented programming
- 8.5 Executing the Scala code
- 8.6 Various classes in Scala such as getters, setters, constructors, abstract, extending objects, and overriding methods
- 8.7 The Java and Scala interoperability
- 8.8 The concept of functional programming and anonymous functions
- 8.9 Bobsrockets package and comparing the mutable and immutable collections
- 8.10 Scala REPL, lazy values, control structures in Scala, directed acyclic graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, and Spark in Hadoop ecosystem
9. SPARK FRAMEWORK
- 9.1 Detailed Apache Spark and its various features
- 9.2 Comparing with Hadoop
- 9.3 Various Spark components
- 9.4 Combining HDFS with Spark and Scalding
- 9.5 Introduction to Scala
- 9.6 Importance of Scala and RDDs Hands-on Exercise: The resilient distributed dataset (RDD) in Spark, How does it help speed up Big Data processing?
10. RDDS IN SPARK
- 10.1 Understanding Spark RDD operations
- 10.2 Comparison of Spark with MapReduce
- 10.3 What is a Spark transformation?
- 10.4 Loading data in Spark
- 10.5 Types of RDD operations, viz. transformation and action
- 10.6 What is a Key/Value pair?
11. DATAFRAMES AND SPARK SQL
- 11.1 The detailed Spark SQL
- 11.2 The significance of SQL in Spark for working with structured data
- 11.3 Spark SQL JSON support
- 11.4 Working with XML data and parquet files
- 11.5 Creating Hive Context
- 11.6 Writing a DataFrame to Hive
- 11.7 How to read a JDBC file?
- 11.8 Significance of a Spark DataFrames
- 11.9 How to create a DataFrame?
- 11.10 What is schema manual inferring?
- 11.11 Working with CSV files, JDBC table reading, data conversion from a DataFrame to JDBC, Spark SQL user-defined functions, shared variable, and accumulators
- 11.12 How to query and transform data in DataFrames?
- 11.13 How a DataFrame provides the benefits of both Spark RDDs and Spark SQL
- 11.14 Deploying Hive on Spark as the execution engine
12. MACHINE LEARNING USING SPARK (MLLIB)
- 12.1 Introduction to Spark MLlib
- 12.2 Understanding various algorithms
- 12.3 What is Spark iterative algorithm?
- 12.4 Spark graph processing analysis
- 12.5 Introducing Machine Learning
- 12.6 K-means clustering
- 12.7 Spark variables like shared and broadcast variables
- 12.8 What are accumulators?
- 12.9 Various ML algorithms supported by MLlib
- 12.10 Linear regression, logistic regression, decision tree, random forest, and kmeans clustering techniques
13. INTEGRATING APACHE FLUME AND APACHE KAFKA
- 13.1 Why and what is Kafka?
- 13.2 Kafka architecture and workflow
- 13.3 Configuring Kafka cluster
- 13.4 Basic operations
- 13.5 Kafka monitoring tools
- 13.6 Integrating Apache Flume and Apache Kafka
14. SPARK STREAMING
- 14.1 Introduction to Spark Streaming
- 14.2 The architecture of Spark Streaming
- 14.3 Working with the Spark Streaming program
- 14.4 Processing data using Spark Streaming
- 14.5 Requesting count and DStream
- 14.6 Multi-batch and sliding window operations
- 14.7 Working with advanced data sources
- 14.8 Features of Spark Streaming
- 14.9 Spark Streaming workflow
- 14.10 Initializing StreamingContext
- 14.11 Discretized Streams (DStreams)
- 14.12 Input DStreams and Receivers
- 14.13 Transformations on DStreams
- 14.14 Output operations on DStreams
- 14.15 Windowed operators and its uses
- 14.16 Important windowed operators and stateful operators
15. HADOOP ADMINISTRATION – MULTI - NODE CLUSTER SETUP USING AMAZON EC2
- 15.1 Create a 4-node Hadoop cluster setup
- 15.2 Running the MapReduce Jobs on the Hadoop cluster
- 15.3 Successfully running the MapReduce code
- 15.4 Working with the Cloudera Manager setup
16. HADOOP ADMINISTRATION – CLUSTER CONFIGURATION
- 16.1 Overview of Hadoop configuration
- 16.2 The importance of Hadoop configuration file
- 16.3 The various parameters and values of configuration
- 16.4 HDFS parameters and MapReduce parameters
- 16.5 Setting up the Hadoop environment
- 16.6 Include and exclude configuration files
- 16.7 The administration and maintenance of NameNode, DataNode, directory structures, and files
- 16.8 What is a File system image?
- 16.9 Understanding the edit log
17. HADOOP ADMINISTRATION: MAINTENANCE, MONITORING, AND TROUBLESHOOTING
- 17.1 Introduction to the checkpoint procedure, NameNode failure
- 17.2 How to ensure the recovery procedure, safe mode, metadata and data backup, various potential problems and solutions, and what to look for and how to add and remove nodes
18. ETL CONNECTIVITY WITH HADOOP ECOSYSTEM (SELF-PACED)
- 18.1 How do ETL tools work in Big Data industry?
- 18.2 Introduction to ETL and data warehousing
- 18.3 Working with prominent use cases of Big Data in the ETL industry
- 18.4 End-to-end ETL PoC showing Big Data integration with the ETL tool

Hadoop Training

₹ 9999

Hadoop Training Program

Overview

Learning Outcomes & Flexibilities

Curriculum

1.HADOOP INSTALLATION AND SETUP Overview

2. . Introduction to Big Data Hadoop and Understanding HDFS and MapReduce

3. DEEP DIVE IN MAPREDUCE

4. INTRODUCTION TO HIVE

5. ADVANCED HIVE AND IMPALA

6. INTRODUCTION TO PIG

7. FLUME, SQOOP, AND HBASE

8. WRITING SPARK APPLICATIONS USING SCALA

9. SPARK FRAMEWORK

10. RDDS IN SPARK

11. DATAFRAMES AND SPARK SQL

12. MACHINE LEARNING USING SPARK (MLLIB)

13. INTEGRATING APACHE FLUME AND APACHE KAFKA

14. SPARK STREAMING

15. HADOOP ADMINISTRATION – MULTI - NODE CLUSTER SETUP USING AMAZON EC2

16. HADOOP ADMINISTRATION – CLUSTER CONFIGURATION

17. HADOOP ADMINISTRATION: MAINTENANCE, MONITORING, AND TROUBLESHOOTING

18. ETL CONNECTIVITY WITH HADOOP ECOSYSTEM (SELF-PACED)

Company

Get In Touch

Courses