Is JobTracker a single point of failure too (besides NameNode) in Hadoop? - hadoop

I am new to Hadoop. In hadoop, I know that when a NameNode fails the entire Hadoop framework goes down. So it's a single point of failure in Hadoop. Is it same for JobTracker? Because if the JobTracker goes down, there would be no daemon to contact Namenode after a job submission and also no point for running the TaskTrackers. How is this handled exactly?

Yes, JobTracker is a single point of failure in MRv1. In case of JobTracker failure all running jobs are halted (http://wiki.apache.org/hadoop/JobTracker).
In YARN, Resource manager is not a single point of failure.
If you need MRv1, you can use MapR distribution, which provides the JobTracker high availability (http://www.mapr.com/resources/videos/demo-hadoop-jobtracker-failing-and-recovering-mapr-cluster).

Jobtracker HA(High Availability using Active and Standby) can be configured in Cloudera Hadoop distribution. See the following link, this feature is available from CDH4.2.1 onwards:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-High-Availability-Guide/cdh4hag_topic_3_1.html
The same can be configured in Hortwonworks distribution also
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_hdp1-system-admin-guide/content/sysadminguides_ha_chap2_5_5.html
In MR2 master service is ResourceManager, which is not Single Point of Failure

Yes job tracker is a single point of failure. In case of namenode failure, secondary namenode will take a charge and act as namenode. In MR-II, there is a resource manager concept introduced. YARN has no. of resource manager, if one fails another resource manager will take a charge.One resource manager is active and other resource manager's are in stand by mode.

No no If NN failure, not Hadoop Framework goes down. Framework different NN failure is different. Hadoop framework is a layer on all nodes. If Name Node goes down, Framework doesn't no where the data should store, and doesn't no where space available to be store. So it's not possible to sore actual data.
Job tracker coordinates with Namenode to get a data to be processed. So when Namenode failure, job tracker also not work properly. So first namenode should work properly. In Hadoop this mechanism is called Namenode Single point of failure.
Job tracker is responsible for job schedule and process the data. If Job tracker not working, Client submits a job request, but the client donesn't no where should that job should submit and where should process. But that logic (you should submit) should know how to resolve the problem, but doesn't know where should submit. So Job tracker failure, it's not possible to process the data and schedule job.
It's a biggest problem in Bigdata analysis problem.
Now Hadoop 2.x resolved these two problems. YERN don't have any single point of failure in namenode level and datanode level.

Related

MapReduce Architecture

I have created a diagram that represents how the MapReduce framework works. Could somebody please validate that this is an accurate representation?
P.S. For the purpose of this example, we are also interested in the system components shown in this diagram.
The MapReduce Architecture works in various different phases for executing a job. Here are the different stages of running a MapReduce Application -
The first stage involves the user writing his data into the HDFS for further processing. This data is stored on different nodes in the form of blocks in the HDFS.
Now the client submits its MapReduce job.
Then, the resource manager launches a container to start the App master.
The App master sends a resource request to the resource manager.
The resource manager now allocates containers on slaves via the node manager.
The App master starts respective tasks in the containers.
The job is now been executed in the container.
When the processing is complete, the resource manager deallocates the resources.
Source: Cloudera
JobTracker, TaskTracker, and MasterNode aren't real things in Hadoop 2+ w/ YARN. Jobs are submitted to a ResourceManager, which creates an ApplicationMaster on one of the NodeManagers.
"Slave Nodes" are commonly also your DataNodes because that is the core tenant of Hadoop - move the processing to the data.
The "Recieve the data" arrow is bi-directional, and there is no arrow from the NameNode to the DataNode. 1) Get the file locations from the NameNode, then locations are sent back to clients. 2) The clients (i.e. NodeManager processes running on a DataNode, or "slave nodes"), will directly read from the DataNodes themselves - the datanodes don't know directly where the other slave nodes exist.
That being said, HDFS and YARN are typically all part of the same "bubble", so the "HDFS" labelled circle you have should really be around everything.

YARN and Hadoop

I had a couple of questions regarding job submission to HDFS and the YARN architecture in Hadoop:
So in the Hadoop ecosystem you have one NameNode for each cluster which can contain any number of data nodes that store your data. When you submit a job to Hadoop, the job tracker on the NameNode will pick each job and assign it to the task tracker on which the file is present on the data node.
So my question is how do the components of YARN work together in HDFS:?
So YARN consists of the NodeManager and the Resource Manager. Out of these two components: Is the NodeManager run on every DataNode and the ResourceManager runs on each NameNode for each cluster? So when the task tracker (in each DataNode) gets assigned a task from the job tracker (in the NameNode), the NodeManager in a specific data node will create an container which will request resources from the ResourceManager in the NameNode. So this resource manager and node manager only come into play when a task tracker in a data node gets a job from the job tracker in the NameNode, in which the NodeManager will ask the ResourceManager for resources for the job to be executed. Is this correct?
You are partially correct. YARN was brought into picture to avoid the burden of Jobtracker which does both scheduling and monitoring. So with YARN you dont have any Job tracker or task tracker. The job done by Job tracker is now done by Resource Manager which has two main components Scheduler(allocating resources to applications) and ApplicationsManager(accepting job submissions and restarts the ApplicationMaster in case of any failure). Now each application has a ApplicationMaster which negotiates containers(where the job would be run) from the scheduler for running application.
Nodemanager runs on every slave node/data node. Resource Manager may/maynot be installed where the namenode is present. For a large cluster we usually need to separate the masters, so that the load doesn't go to a single machine.

What happens to hadoop job when the NameNode is down?

In Hadoop 1.2.1, I would like to know some basic understanding on these below questions
Who receives the hadoop job? Is it NameNode or JobTracker?
What will happen if somebody submits a Hadoop job when the NameNode is down?Does the hadoop job fail? or Does it get in to Hold?
What will happen if somebody submits a Hadoop job when the JobTracker is down? Does the hadoop job fail? or Does it get in to Hold?
By Hadoop job, you probably mean MapReduce job. If your NN is down, and you don't have spare one (in HA setup) your HDFS will not be working and every component dependent on this HDFS namespace will be either stuck or crashed.
1) JobTracker (Yarn ResourceManager with hadoop 2.x)
2) I am not completely sure, but probably job will become submitted and fail afterwards
3) You cannot submit job to a stopped JobTracker.
Client submits job to the Namenode. Namenode looks for the data requested by the client and gives the block information.
JobTracker is responsible for the job to be completed and the allocation of resources to the job.
In Case 2 & 3 - Jobs fails.

Queries about YARN (failure modes, container size, practical example)

I want to ask few questions to understand the working of YARN:
Anyone can explain or refer to any document which can easily about the failure modes in YARN (i.e. Task Failure, Application master failure, Node Manager failure, Resource manager failure)
What is the container size in YARN? is it same as slot in Map reduce 1?
Any practical/working example of YARN ?
Thank you
Refer to Hadoop Definitive Guide text book ... Apart from that there is lot of info in apache web site.
Container size is not fixed it is dynamically allocated based on requirement by Resource Manager.
From developer perspective same old map-reduce will work on YARN.
ResourceManager failures
In the initial versions of the YARN framework, ResourceManager failures meant a total cluster failure, as it was a single point of failure. The ResourceManager stores the state of
the cluster, such as the metadata of the submitted application, information on cluster
resource containers, information on the cluster’s general configurations, and so on.
Therefore, if the ResourceManager goes down because of some hardware failure, then
there is no way to avoid manually debugging the cluster and restarting the
ResourceManager. During the time the ResourceManager is down, the cluster is
unavailable, and once it gets restarted, all jobs would need a restart, so the half-completed jobs lose any data and need to be restarted again. In short, a restart of the ResourceManager used to restart all the running ApplicationMasters. The latest versions of YARN address this problem in two ways. One way is by creating an active-passive ResourceManager architecture, so that when one goes down, another becomes active and takes responsibility for the cluster. Another way is by using the Zookeeper ResourceManager quorum, so that the ResourceManager state is stored externally over the Zookeeper, and one
ResourceManager is in an active state and one or more ResourceManagers are in passive mode, waiting for something to happen that brings them to an active state.
ApplicationMaster failures
When the ApplicationMaster fails, the ResourceManager simply starts another container with a new ApplicationMaster running in it for another application attempt. It is the responsibility of the new ApplicationMaster
to recover the state of the older ApplicationMaster, and this is possible only when ApplicationMasters persist their states in the external location so that it can be used for future reference. ApplicatoinMaster will store their state to persisitant disk thus all the status till the failure can be recovered.
NodeManager Failures
If a Node Manager fails, the ResourceManager detects this failure using a time-out (that is, stops receiving the heartbeats from the NodeManager). The ResourceManager then removes the NodeManager from its pool of available NodeManagers. It also kills all the containers running on that node & reports the failure to all running AMs. AMs are then responsible for reacting to node failures, by redoing the work done by any containers running on that node during the fault.
Container Failures
Container failures will be reported by node manager to Resource manager and Resource manager informs the same to Application Master. Now Application will restart the container.

How does Apache Spark handles system failure when deployed in YARN?

Preconditions
Let's assume Apache Spark is deployed on a hadoop cluster using YARN. Furthermore a spark execution is running. How does spark handle the situations listed below?
Cases & Questions
One node of the hadoop clusters fails due to a disc error. However replication is high enough and no data was lost.
What will happen to tasks that where running at that node?
One node of the hadoop clusters fails due to a disc error. Replication was not high enough and data was lost. Simply spark couldn't find a file anymore which was pre-configured as resource for the work flow.
How will it handle this situation?
During execution the primary namenode fails over.
Did spark automatically use the fail over namenode?
What happens when the secondary namenode fails as well?
For some reasons during a work flow the cluster is totally shut down.
Will spark restart with the cluster automatically?
Will it resume to the last "save" point during the work flow?
I know, some questions might sound odd. Anyway, I hope you can answer some or all.
Thanks in advance. :)
Here are the answers given by the mailing list to the questions (answers where provided by Sandy Ryza of Cloudera):
"Spark will rerun those tasks on a different node."
"After a number of failed task attempts trying to read the block, Spark would pass up whatever error HDFS is returning and fail the job."
"Spark accesses HDFS through the normal HDFS client APIs. Under an HA configuration, these will automatically fail over to the new namenode. If no namenodes are left, the Spark job will fail."
Restart is part of administration and "Spark has support for checkpointing to HDFS, so you would be able to go back to the last time checkpoint was called that HDFS was available."

Resources