I have a Fully-Distributed Hadoop cluster with 4 nodes.When I submit my job to Jobtracker which decide 12 map tasks will be cool for my job,something strange happens.The 12 map tasks always running on a single node instead of running on the entire cluster.Before I ask the question ,I have already done the things below:
Try different Job
Run start-balance.sh to rebalance the cluster
But it does not work,so I hope someone can tell me why and how to fix it.
If all the blocks of input data files are in that node, the scheduler with prioritize the same node
Apparently the source data files is in one data node now. It could't be the balancer's fault. From what I can see, your hdfs must only have one replication or you are not in a Fully-Distributed Hadoop cluster.
Check how your input is being split. You may only have one input split, meaning that only one Node will be used to process the data. You can test this by adding more input files to your stem and placing them on different nodes, then checking which nodes are doing the work.
If that doesn't work, check to make sure that your cluster is configured correctly. Specifically, check that your name node has paths to your other nodes set in its slaves file, and that each slave node has your name node set in its masters file.
Related
I want to build a cluster environment where nodes are automatically created and deleted. The jobs are to be distributed to the various nodes using Slurm.
Two questions:
Is there an agent or similar for the Slurm workers so that the nodes automatically register with the head node?
Is it possible to change the Slurm config file during runtime? (since new worker nodes could be added or deleted).
You would need to restart the Slurm daemon for changes to the slurm.conf file to take effect, which could be problematic for jobs that are running. You may have errors (job failures or worse) if the Slurm control daemon finds that the slurm.conf is different due to checksum mismatch (see the official docs on adding nodes: https://slurm.schedmd.com/faq.html#add_nodes).
I am confused that when I run the commond " hadoop dfsadmin -report" I can see there
but the resource manager , cluster metric, it shows that
why is that and why could that happen?
Thanks in advance!
Your connected with 9 slave nodes. But 5 slave nodes are in active state remaining are in unhealthy state.
Reason for unhealthy state:
Hadoop MapReduce provides a mechanism by which administrators can configure the TaskTracker to run an administrator supplied script periodically to determine if a node is healthy or not. Administrators can determine if the node is in a healthy state by performing any checks of their choice in the script. If the script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR. The TaskTracker spawns the script periodically and checks its output. If the script's output contains the string ERROR, as described above, the node's status is reported as 'unhealthy' and the node is black-listed on the JobTracker. No further tasks will be assigned to this node. However, the TaskTracker continues to run the script, so that if the node becomes healthy again, it will be removed from the blacklisted nodes on the JobTracker automatically. The node's health along with the output of the script, if it is unhealthy, is available to the administrator in the JobTracker's web interface. The time since the node was healthy is also displayed on the web interface.
Reason for Lost Nodes:
I think some BLOCKS (data) may not available in slaves. So It shows lost node as 9.
To remove Dead nodes from cluster use this link To Decommission Nodes
Cluster metrics in ResourceManager show the status of NodeManager.
hadoop dfsadmin -report this command shows the status of Datanodes.
I'm running Hadoop 1.1.2 on a cluster with 10+ machines. I would like to nicely scale up and down, both for HDFS and MapReduce. By "nicely", I mean that I require that data not be lost (allow HDFS nodes to decomission), and nodes running a task finish before shutting down.
I've noticed the datanode process dies once decomissioning is done, which is good. This is what I do to remove a node:
Add node to mapred.exclude
Add node to hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh stop tasktracker
To add the node back in (assuming it was removed like above), this is what I'm doing.
Remove from mapred.exclude
Remove from hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh start tasktracker
$ hadoop-daemon.sh start datanode
Is this the correct way to scale up and down "nicely"? When scaling down, I'm noticing job-duration rises sharply for certain unlucky jobs (since the tasks they had running on the removed node need to be re-scheduled).
If you have not set dfs exclude file before, follow 1-3. Else start from 4.
Shut down the NameNode.
Set dfs.hosts.exclude to point to an empty exclude file.
Restart NameNode.
In the dfs exclude file, specify the nodes using the full hostname or IP or IP:port format.
Do the same in mapred.exclude
execute bin/hadoop dfsadmin -refreshNodes. This forces the NameNode to reread the exclude file and start the decommissioning process.
execute bin/hadoop mradmin -refreshNodes
Monitor the NameNode and JobTracker web UI and confirm the decommission process is in progress. It can take a few seconds to update. Messages like "Decommission complete for node XXXX.XXXX.X.XX:XXXXX" will appear in the NameNode log files when it finishes decommissioning, at which point you can remove the nodes from the cluster.
When the process has completed, the namenode UI will list the datanode as decommissioned. The Jobtracker page will show the updated number of active nodes. Run bin/hadoop dfsadmin -report to verify. Stop the datanode and tasktracker process on the excluded node(s).
If you do not plan to reintroduce the machine to the cluster, remove it from the
include and exclude files.
To add a node as datanode and tasktracker see Hadoop FAQ page
EDIT : When a live node is to be removed from the cluster, what happens to the Job ?
The jobs running on a node to be de-commissioned would get affected as the tasks of the job scheduled on that node(s) would be marked as KILLED_UNCLEAN (for map and reduce tasks) or KILLED (for job setup and cleanup tasks). See line 4633 in JobTracker.java for details. The job will be informed to fail that task. Most of the time, Job tracker will reschedule execution. However, after many repeated failures it may instead decide to allow the entire job to fail or succeed. See line 2957 onwards in JobInProgress.java.
You should be aware that since for Hadoop to perform well, it really wants to have the data available in multiple copies. By removing nodes, you remove the chances of the data being optimally available, and you put extra stress on the cluster to ensure the availablility.
I.e. by taking down a node, you do enfore that an extra copy of all its data is made somewhere else. So you shouldn't really be doing this just for fun, not unless you use a different data management paradigm than in the default configuration (= keep 3 copies in the cluster).
And for a Hadoop cluster to perform well, you will want to actually store the data in the cluster. Otherwise, you can't really move the computation to the data, because the data isn't there yet either. Much about Hadoop is about having "smart drives" that can perform computation before sending the data across the network.
So in order to make this reasonable, you will likely need to somehow split your cluster. Have one set of nodes keep the 3 master copies of the original data, and have some "add-on" nodes that are only used for storing intermediate data and perform computations on that part. Never change the master nodes, so they don't need to redistribute your data. Shut down add-on nodes only when they are empty? But that probably is not yet implemented.
While decommissioning in progress, temporary or staging files get cleaned automatically. These files are missing now and hadoop is not recognizing how that went missing. So the decommissioning process keeps waiting until that is resolved even though the actual decommissioning is done for all the other files.
In Hadoop GUI - if you notice the parameter "Number of Under-Replicated Blocks" is not reducing over the time or almost constant then this is the reason likely.
So list the files using below command
hadoop fsck / -files -blocks -racks
If you see those files are temporary and not required then delete those files or folder
Example: hadoop fs -rmr /var/local/hadoop/hadoop/.staging/* (give the correct path here)
This would solve the problem immediately. De-commissioned nodes will move to Dead Nodes in 5 mins.
I'm looking for some specific information regarding the chain of events when running a MapReduce job on a Hadoop cluster.
Let's assume that my Reduce tasks are on the verge of completion. After my last reducer has written its output to the output file, how many replicas of the output file are there?
What exactly happens after the last reducer has finished writing to the output file. When does the NameNode request the respective Data Nodes to replicate the output file? And how is the Name Node informed that the output file is ready? Who conveys that information to the NameNode?
Thank you!
The Reduce tasks write output to HDFS. They do this by first communicating with the name node to request a block. The name node then tells the reducer which data nodes to write to, and then the reducer actually sends the data directly to the first data node, which then sends it to the second data node, which sends it to the third node. Typically the name node will keep things local, so the first data node is probably the same machine that is running the reduce task.
Once the reducer has finished writing outputs, and the data nodes have confirmed this, the reducer itself will tell the job tracker that it has finished via periodic heartbeat communication.
To understand the basics of HDFS replication, have a read over replica placement in the HDFS architecture document. In a nutshell, the NameNode will try to use the same rack to minimize latency.
I run some batch jobs with data inputs that are constantly changing and I'm having problems provisioning capacity. I am using whirl to do the intial setup but once I start, for example, 5 machines I don't know how to add new machines to it while its running. I don't know in advance how complex or how large the data will be so I was wondering if there was a way to add new machines to a cluster and have it take effect right away(or with some delay but don't want to have to bring down the cluster and bring it up with the new nodes).
There is exact explanation how to add node:
http://wiki.apache.org/hadoop/FAQ#I_have_a_new_node_I_want_to_add_to_a_running_Hadoop_cluster.3B_how_do_I_start_services_on_just_one_node.3F
In the same time - I am not sure that already running jobs will take advantages of these nodes since planning where to run each task happens during job start time (as far as I understand).
I also think that it is more practical to run Task Trackers only on these transient nodes.
Check the files referred by the below parameters:
dfs.hosts => dfs.include
dfs.hosts.exclude
mapreduce.jobtracker.hosts.filename => mapred.include
mapreduce.jobtracker.hosts.exclude.filename
You can add the list of hosts to the files dfs.include and mapred.include and then run
hadoop mradmin -refreshNodes ;
hadoop dfsadmin -refreshNodes ;
That's all.
BTW, 'mradmin -refreshNodes' facility was added in 0.21
Nikhil