Hadoop Training Program
Ideal Training's Hadoop curated by industry experts and it covers in-depth knowledge on all Hadoop tools. Through this online industry led hadoop certification training you will real life industry cases.
Overview
- Lectures 30
- Quizzes 1
- Duration 60 hours
- Skill level Beginner
- Language English
- Assessments Yes
Learning Outcomes & Flexibilities
- Over 18 modules and 55.5 hours of content!
- Learn from Industry Experts.
- Gain Job-ready Skills.
- Attend Live Instructor-led Sessions.
- Get help to create a world-class resume, promote your profile, gain salary negotiation skills, and have mock interview sessions
- Life time access to Recorded Sessions
- Get personalized learning experience with your mentor, who will track your progress and provide insights.
- Job assistance and interview scheduling.
- Certification after completion of Course.
Curriculum
-
1.HADOOP INSTALLATION AND SETUP Overview
-
1.1 The architecture of Hadoop cluster
-
1.2 What is high availability and federation?
-
1.3 How to setup a production cluster?
-
1.4 Various shell commands in Hadoop
-
1.5 Understanding configuration files in Hadoop
-
1.6 Installing a single node cluster with Cloudera Manager
-
1.7 Understanding Spark, Scala, Sqoop, Pig, and Flume
-
-
2. . Introduction to Big Data Hadoop and Understanding HDFS and MapReduce
-
2.1 Introducing Big Data and Hadoop
-
2.2 What is Big Data, and where does Hadoop fit in?
-
2.3Two important Hadoop ecosystem components, namely, MapReduce and HDFS
-
2.4 In-depth Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability and in-depth YARN – resource manager and node manager
-
-
3. DEEP DIVE IN MAPREDUCE
-
3.1 Learning the working mechanism of MapReduce .
-
3.2 Understanding the mapping and reducing stages in MR
-
3.3 Various terminology in MR such as input format, output format, partitioners, combiners, shuffle, and sort
-
-
4. INTRODUCTION TO HIVE
-
4.1 Introducing Hadoop Hive.
-
4.2 Detailed architecture of Hive
-
4.3 Comparing Hive with Pig and RDBMS
-
4.4 Working with Hive Query Language
-
4.5 Creation of a database, table, group by, and other clauses
-
4.6 Various types of Hive tables and HCatalog
-
4.7 Storing Hive results, Hive partitioning, and buckets
-
-
5. ADVANCED HIVE AND IMPALA
-
5.1 Indexing in Hive
-
5.2 The map-side join in Hive
-
5.3 Working with complex data types
-
5.4 The Hive user-defined functions
-
5.5 Introduction to Impala
-
5.6 Comparing Hive with Impala
-
5.7 The detailed architecture of Impala
-
-
6. INTRODUCTION TO PIG
-
6.1 Apache Pig introduction and its various features .
-
6.2 Various data types and schema in Pig
-
6.3 The available functions in Pig, Hive bags, tuples, and fields
-
-
7. FLUME, SQOOP, AND HBASE
-
7.1 Apache Sqoop introduction
-
7.2 Importing and exporting data
-
7.3 Performance improvement with Sqoop
-
7.4 Sqoop limitations
-
7.5 Introduction to Flume and understanding the architecture of Flume
-
7.6 What are HBase and the CAP theorem?
-
-
8. WRITING SPARK APPLICATIONS USING SCALA
-
8.1 Using Scala for writing Apache Spark applications
-
8.2 Detailed study of Scala
-
8.3 The need for Scala
-
8.4 The concept of object-oriented programming
-
8.5 Executing the Scala code
-
8.6 Various classes in Scala such as getters, setters, constructors, abstract, extending objects, and overriding methods
-
8.7 The Java and Scala interoperability
-
8.8 The concept of functional programming and anonymous functions
-
8.9 Bobsrockets package and comparing the mutable and immutable collections
-
8.10 Scala REPL, lazy values, control structures in Scala, directed acyclic graph (DAG), first Spark application using SBT/Eclipse, Spark Web UI, and Spark in Hadoop ecosystem
-
-
9. SPARK FRAMEWORK
-
9.1 Detailed Apache Spark and its various features
-
9.2 Comparing with Hadoop
-
9.3 Various Spark components
-
9.4 Combining HDFS with Spark and Scalding
-
9.5 Introduction to Scala
-
9.6 Importance of Scala and RDDs Hands-on Exercise: The resilient distributed dataset (RDD) in Spark, How does it help speed up Big Data processing?
-
-
10. RDDS IN SPARK
-
10.1 Understanding Spark RDD operations
-
10.2 Comparison of Spark with MapReduce
-
10.3 What is a Spark transformation?
-
10.4 Loading data in Spark
-
10.5 Types of RDD operations, viz. transformation and action
-
10.6 What is a Key/Value pair?
-
-
11. DATAFRAMES AND SPARK SQL
-
11.1 The detailed Spark SQL
-
11.2 The significance of SQL in Spark for working with structured data
-
11.3 Spark SQL JSON support
-
11.4 Working with XML data and parquet files
-
11.5 Creating Hive Context
-
11.6 Writing a DataFrame to Hive
-
11.7 How to read a JDBC file?
-
11.8 Significance of a Spark DataFrames
-
11.9 How to create a DataFrame?
-
11.10 What is schema manual inferring?
-
11.11 Working with CSV files, JDBC table reading, data conversion from a DataFrame to JDBC, Spark SQL user-defined functions, shared variable, and accumulators
-
11.12 How to query and transform data in DataFrames?
-
11.13 How a DataFrame provides the benefits of both Spark RDDs and Spark SQL
-
11.14 Deploying Hive on Spark as the execution engine
-
-
12. MACHINE LEARNING USING SPARK (MLLIB)
-
12.1 Introduction to Spark MLlib
-
12.2 Understanding various algorithms
-
12.3 What is Spark iterative algorithm?
-
12.4 Spark graph processing analysis
-
12.5 Introducing Machine Learning
-
12.6 K-means clustering
-
12.7 Spark variables like shared and broadcast variables
-
12.8 What are accumulators?
-
12.9 Various ML algorithms supported by MLlib
-
12.10 Linear regression, logistic regression, decision tree, random forest, and kmeans clustering techniques
-
-
13. INTEGRATING APACHE FLUME AND APACHE KAFKA
-
13.1 Why and what is Kafka?
-
13.2 Kafka architecture and workflow
-
13.3 Configuring Kafka cluster
-
13.4 Basic operations
-
13.5 Kafka monitoring tools
-
13.6 Integrating Apache Flume and Apache Kafka
-
-
14. SPARK STREAMING
-
14.1 Introduction to Spark Streaming
-
14.2 The architecture of Spark Streaming
-
14.3 Working with the Spark Streaming program
-
14.4 Processing data using Spark Streaming
-
14.5 Requesting count and DStream
-
14.6 Multi-batch and sliding window operations
-
14.7 Working with advanced data sources
-
14.8 Features of Spark Streaming
-
14.9 Spark Streaming workflow
-
14.10 Initializing StreamingContext
-
14.11 Discretized Streams (DStreams)
-
14.12 Input DStreams and Receivers
-
14.13 Transformations on DStreams
-
14.14 Output operations on DStreams
-
14.15 Windowed operators and its uses
-
14.16 Important windowed operators and stateful operators
-
-
15. HADOOP ADMINISTRATION – MULTI - NODE CLUSTER SETUP USING AMAZON EC2
-
15.1 Create a 4-node Hadoop cluster setup
-
15.2 Running the MapReduce Jobs on the Hadoop cluster
-
15.3 Successfully running the MapReduce code
-
15.4 Working with the Cloudera Manager setup
-
-
16. HADOOP ADMINISTRATION – CLUSTER CONFIGURATION
-
16.1 Overview of Hadoop configuration
-
16.2 The importance of Hadoop configuration file
-
16.3 The various parameters and values of configuration
-
16.4 HDFS parameters and MapReduce parameters
-
16.5 Setting up the Hadoop environment
-
16.6 Include and exclude configuration files
-
16.7 The administration and maintenance of NameNode, DataNode, directory structures, and files
-
16.8 What is a File system image?
-
16.9 Understanding the edit log
-
-
17. HADOOP ADMINISTRATION: MAINTENANCE, MONITORING, AND TROUBLESHOOTING
-
17.1 Introduction to the checkpoint procedure, NameNode failure
-
17.2 How to ensure the recovery procedure, safe mode, metadata and data backup, various potential problems and solutions, and what to look for and how to add and remove nodes
-
-
18. ETL CONNECTIVITY WITH HADOOP ECOSYSTEM (SELF-PACED)
-
18.1 How do ETL tools work in Big Data industry?
-
18.2 Introduction to ETL and data warehousing
-
18.3 Working with prominent use cases of Big Data in the ETL industry
-
18.4 End-to-end ETL PoC showing Big Data integration with the ETL tool
-