I submitted job to the cluster of 4 hosts, I can see that it was correctly spread among 4 nodes, 1 map task per node.
Later on, one of the node failed.
I stopped tasktracker on the failed node, added the ID of that node to excludes file and updated list of nodes with hadoop mradmin -refreshNodes. The failed node disappeared from list of available nodes on hadoop administration pages.
Then I started tasktracker again, updated nodes with mradmin, and observed that the node appeared in job tracker list again.
During the time of the node being down, hadoop re-scheduled map task execution on another node, so it started to run 2 map jobs. I've got the cluster unbalanced:
2 nodes were running 1 task each,
1 node was running 2 tasks
and 1 node (the one I restarted) was running no tasks.
I killed the job with hadoop job -kill-task attempt_201308010141_0001_m_000000_1 and looks like it never starts again - so I can see 3 nodes running 1 task each, 1 node with no tasks at all and 1 pending task in the list.
Am I missing something? What is the correct way of 'moving' task from one node to another one?
Jobs keep a list of blacklisted tasktrackers (there is a global blacklist and a per job one).
I think that's why your new attempt don't start again at the end on the restarted task tracker.
You can try the commands :
hadoop job -unblacklist <jobid> <hostname>
hadoop job -unblacklist-tracker <hostname>
From http://doc.mapr.com/display/MapR/TaskTracker+Blacklisting
Related
I want to build a cluster environment where nodes are automatically created and deleted. The jobs are to be distributed to the various nodes using Slurm.
Two questions:
Is there an agent or similar for the Slurm workers so that the nodes automatically register with the head node?
Is it possible to change the Slurm config file during runtime? (since new worker nodes could be added or deleted).
You would need to restart the Slurm daemon for changes to the slurm.conf file to take effect, which could be problematic for jobs that are running. You may have errors (job failures or worse) if the Slurm control daemon finds that the slurm.conf is different due to checksum mismatch (see the official docs on adding nodes: https://slurm.schedmd.com/faq.html#add_nodes).
I setup a cluster which consists of 1 master and 3 workers.
In normal condition, as we know, if users submit some jobs and jobs will be distributed to three workers for execution.
However, if I want to assign such as
job id_1 to worker 1 and worker 2, but no worker 3
job id_2 to worker 1, worker 2 and worker 3
job id_3 to worker 2 and worker 3, but no worker 1
Can Spark do this through some configuration setting, scheduling or write code to assign the job to workers which are specified?
Any idea or method can be recommended.
You should not do this because it will make your job slow will create unwanted problems .
Set location preference! If you know all the names of the worker
machines, you can created with a version of parallelize where you can
set the preferred location of each partition. That will ensure a
deterministic behavior of sending each partition to corresponding
worker (assuming speculative execution, and delay scheduling is turned
off).
To figure out the name of the worker nodes without hardcoding, you
could run a dummy Spark job with many many partitions which will
return the hostname of all the workers. Not that will try to ensure
(but not guarantee) at least one partition will be scheduled on each
active worker. In fact, if there are other jobs running in the system,
then probably these dummy tasks will not get scheduled on all the
workers. Hard to get around this without some outside mechanism to
know all the workers in the cluster.
I have never tried this thing the way you are trying to submit the job.
Might be this is a possible solution hint for your questionSpark Reply
Go through the Cluster Mode
I am confused that when I run the commond " hadoop dfsadmin -report" I can see there
but the resource manager , cluster metric, it shows that
why is that and why could that happen?
Thanks in advance!
Your connected with 9 slave nodes. But 5 slave nodes are in active state remaining are in unhealthy state.
Reason for unhealthy state:
Hadoop MapReduce provides a mechanism by which administrators can configure the TaskTracker to run an administrator supplied script periodically to determine if a node is healthy or not. Administrators can determine if the node is in a healthy state by performing any checks of their choice in the script. If the script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR. The TaskTracker spawns the script periodically and checks its output. If the script's output contains the string ERROR, as described above, the node's status is reported as 'unhealthy' and the node is black-listed on the JobTracker. No further tasks will be assigned to this node. However, the TaskTracker continues to run the script, so that if the node becomes healthy again, it will be removed from the blacklisted nodes on the JobTracker automatically. The node's health along with the output of the script, if it is unhealthy, is available to the administrator in the JobTracker's web interface. The time since the node was healthy is also displayed on the web interface.
Reason for Lost Nodes:
I think some BLOCKS (data) may not available in slaves. So It shows lost node as 9.
To remove Dead nodes from cluster use this link To Decommission Nodes
Cluster metrics in ResourceManager show the status of NodeManager.
hadoop dfsadmin -report this command shows the status of Datanodes.
I'm running Hadoop 1.1.2 on a cluster with 10+ machines. I would like to nicely scale up and down, both for HDFS and MapReduce. By "nicely", I mean that I require that data not be lost (allow HDFS nodes to decomission), and nodes running a task finish before shutting down.
I've noticed the datanode process dies once decomissioning is done, which is good. This is what I do to remove a node:
Add node to mapred.exclude
Add node to hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh stop tasktracker
To add the node back in (assuming it was removed like above), this is what I'm doing.
Remove from mapred.exclude
Remove from hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh start tasktracker
$ hadoop-daemon.sh start datanode
Is this the correct way to scale up and down "nicely"? When scaling down, I'm noticing job-duration rises sharply for certain unlucky jobs (since the tasks they had running on the removed node need to be re-scheduled).
If you have not set dfs exclude file before, follow 1-3. Else start from 4.
Shut down the NameNode.
Set dfs.hosts.exclude to point to an empty exclude file.
Restart NameNode.
In the dfs exclude file, specify the nodes using the full hostname or IP or IP:port format.
Do the same in mapred.exclude
execute bin/hadoop dfsadmin -refreshNodes. This forces the NameNode to reread the exclude file and start the decommissioning process.
execute bin/hadoop mradmin -refreshNodes
Monitor the NameNode and JobTracker web UI and confirm the decommission process is in progress. It can take a few seconds to update. Messages like "Decommission complete for node XXXX.XXXX.X.XX:XXXXX" will appear in the NameNode log files when it finishes decommissioning, at which point you can remove the nodes from the cluster.
When the process has completed, the namenode UI will list the datanode as decommissioned. The Jobtracker page will show the updated number of active nodes. Run bin/hadoop dfsadmin -report to verify. Stop the datanode and tasktracker process on the excluded node(s).
If you do not plan to reintroduce the machine to the cluster, remove it from the
include and exclude files.
To add a node as datanode and tasktracker see Hadoop FAQ page
EDIT : When a live node is to be removed from the cluster, what happens to the Job ?
The jobs running on a node to be de-commissioned would get affected as the tasks of the job scheduled on that node(s) would be marked as KILLED_UNCLEAN (for map and reduce tasks) or KILLED (for job setup and cleanup tasks). See line 4633 in JobTracker.java for details. The job will be informed to fail that task. Most of the time, Job tracker will reschedule execution. However, after many repeated failures it may instead decide to allow the entire job to fail or succeed. See line 2957 onwards in JobInProgress.java.
You should be aware that since for Hadoop to perform well, it really wants to have the data available in multiple copies. By removing nodes, you remove the chances of the data being optimally available, and you put extra stress on the cluster to ensure the availablility.
I.e. by taking down a node, you do enfore that an extra copy of all its data is made somewhere else. So you shouldn't really be doing this just for fun, not unless you use a different data management paradigm than in the default configuration (= keep 3 copies in the cluster).
And for a Hadoop cluster to perform well, you will want to actually store the data in the cluster. Otherwise, you can't really move the computation to the data, because the data isn't there yet either. Much about Hadoop is about having "smart drives" that can perform computation before sending the data across the network.
So in order to make this reasonable, you will likely need to somehow split your cluster. Have one set of nodes keep the 3 master copies of the original data, and have some "add-on" nodes that are only used for storing intermediate data and perform computations on that part. Never change the master nodes, so they don't need to redistribute your data. Shut down add-on nodes only when they are empty? But that probably is not yet implemented.
While decommissioning in progress, temporary or staging files get cleaned automatically. These files are missing now and hadoop is not recognizing how that went missing. So the decommissioning process keeps waiting until that is resolved even though the actual decommissioning is done for all the other files.
In Hadoop GUI - if you notice the parameter "Number of Under-Replicated Blocks" is not reducing over the time or almost constant then this is the reason likely.
So list the files using below command
hadoop fsck / -files -blocks -racks
If you see those files are temporary and not required then delete those files or folder
Example: hadoop fs -rmr /var/local/hadoop/hadoop/.staging/* (give the correct path here)
This would solve the problem immediately. De-commissioned nodes will move to Dead Nodes in 5 mins.
I have a Fully-Distributed Hadoop cluster with 4 nodes.When I submit my job to Jobtracker which decide 12 map tasks will be cool for my job,something strange happens.The 12 map tasks always running on a single node instead of running on the entire cluster.Before I ask the question ,I have already done the things below:
Try different Job
Run start-balance.sh to rebalance the cluster
But it does not work,so I hope someone can tell me why and how to fix it.
If all the blocks of input data files are in that node, the scheduler with prioritize the same node
Apparently the source data files is in one data node now. It could't be the balancer's fault. From what I can see, your hdfs must only have one replication or you are not in a Fully-Distributed Hadoop cluster.
Check how your input is being split. You may only have one input split, meaning that only one Node will be used to process the data. You can test this by adding more input files to your stem and placing them on different nodes, then checking which nodes are doing the work.
If that doesn't work, check to make sure that your cluster is configured correctly. Specifically, check that your name node has paths to your other nodes set in its slaves file, and that each slave node has your name node set in its masters file.