hadoop distcp not working,MR job in accepted state - hadoop

I am trying to copy data from CDH4 to CDH5 cluster. When I submit the distcp job from CDH5, MR job goes to accepted state and stays there ( I have tried it multiple times, it stayed there for more than 15 hrs). Data I want to copy is less than 10MB.
Below is the setup and steps I am using.
Source: CDH4, e.g. NodeName = cloudera4
Destination: CDH5, e.g. NodeName = Cloudera1
Command used on CDH5:
hadoop distcp hftp://Cloudera4:50070/ hdfs://Cloudera1/
Below is the console output:
[root#Cloudera1-RD opt]# sudo -u hdfs hadoop distcp hftp://Cloudera4:50070/ hdfs://Cloudera1/
15/03/05 10:51:23 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://Cloudera4:50070/], targetPath=hdfs://Cloudera1/, targetPathExists=true, preserveRawXattrs=false}
15/03/05 10:51:23 INFO client.RMProxy: Connecting to ResourceManager at Cloudera1:8032
15/03/05 10:51:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/03/05 10:51:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/03/05 10:51:28 INFO client.RMProxy: Connecting to ResourceManager at Cloudera1:8032
15/03/05 10:51:29 INFO mapreduce.JobSubmitter: number of splits:18
15/03/05 10:51:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1425491750932_0010
15/03/05 10:51:30 INFO impl.YarnClientImpl: Submitted application application_1425491750932_0010
15/03/05 10:51:30 INFO mapreduce.Job: The url to track the job: http://Cloudera1:8088/proxy/application_1425491750932_0010/
15/03/05 10:51:30 INFO tools.DistCp: DistCp job-id: job_1425491750932_0010
15/03/05 10:51:30 INFO mapreduce.Job: Running job: job_1425491750932_0010
This MR job stays in Accepted state forever.
I am stuck with this from many days now.
I really appreciate your help.

The problem with your code is Do not run distcp as the hdfs user which is blacklisted for MapReduce jobs by default.
Refer the Link and run distcp

solved it by using:
hdfs dfs -cp s3://<path> hdfs:///user/livy/

Related

MapReduce job never enters in running state

I have small jar file which is correct because i tested it on other computer and it works with hadoop.
Now i have setup hadoop on my pc and when i submit a job then it never passes the accept state.
In the browser i can see that the job is accepted but it never gets executed. Here is the screenshot.
I see there is a warning in the console:
WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
Full logs are :
C:\Users\afraz\Desktop\MapReduceData>hadoop jar outs.jar 1902 spo
2019-05-01 22:27:40,842 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2019-05-01 22:27:41,882 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-05-01 22:27:41,925 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/afraz/.staging/job_1556742397967_0001
2019-05-01 22:27:42,890 INFO input.FileInputFormat: Total input files to process : 1
2019-05-01 22:27:43,048 INFO mapreduce.JobSubmitter: number of splits:1
2019-05-01 22:27:43,250 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1556742397967_0001
2019-05-01 22:27:43,254 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-05-01 22:27:43,543 INFO conf.Configuration: resource-types.xml not found
2019-05-01 22:27:43,544 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-05-01 22:27:44,093 INFO impl.YarnClientImpl: Submitted application application_1556742397967_0001
2019-05-01 22:27:44,155 INFO mapreduce.Job: The url to track the job: http://LAPTOP-PN52M98R:8088/proxy/application_1556742397967_0001/
2019-05-01 22:27:44,157 INFO mapreduce.Job: Running job: job_1556742397967_0001
Any help would be great.
Seems you have no active NodeManagers
Memory Total: 0B VCores Total: 0
Unhealthy Nodes: 1
Your job was accepted by the ResourceManager, but cannot be ran until there are available resources to start it on
I suggest finding the NodeManager log file on your machine, then seeing if there are any noticable exceptions mentioned there

Running Hadoop MapReduce word count for the first time fails?

When running the Hadoop word count example the first time it fails. Here's what I'm doing:
Format namenode: $HADOOP_HOME/bin/hdfs namenode -format
Start HDFS/YARN:
$HADOOP_HOME/sbin/start-dfs.sh
$HADOOP_HOME/sbin/start-yarn.sh
$HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager
Run wordcount: hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount input output
(let's say input folder is already in HDFS I'm not gonna put every single command here)
Output:
16/07/17 01:04:34 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.2:8032
16/07/17 01:04:35 INFO input.FileInputFormat: Total input paths to process : 2
16/07/17 01:04:35 INFO mapreduce.JobSubmitter: number of splits:2
16/07/17 01:04:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468688654488_0001
16/07/17 01:04:36 INFO impl.YarnClientImpl: Submitted application application_1468688654488_0001
16/07/17 01:04:36 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468688654488_0001/
16/07/17 01:04:36 INFO mapreduce.Job: Running job: job_1468688654488_0001
16/07/17 01:04:46 INFO mapreduce.Job: Job job_1468688654488_0001 running in uber mode : false
16/07/17 01:04:46 INFO mapreduce.Job: map 0% reduce 0%
Terminated
And then HDFS crashes so I can't access http://localhost:50070/
Then I restart eveyrthing (repeat step 2), rerun the example and everything's fine.
How can I fix it for the first run? My HDFS obviously has no data the first time around, maybe that's the problem?
UPDATE:
Running an even simpler example fails as well:
hadoop#8f98bf86ceba:~$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples*.jar pi 3 3
Number of Maps = 3
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
16/07/17 03:21:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/172.20.0.3:8032
16/07/17 03:21:29 INFO input.FileInputFormat: Total input paths to process : 3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: number of splits:3
16/07/17 03:21:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468696855031_0001
16/07/17 03:21:31 INFO impl.YarnClientImpl: Submitted application application_1468696855031_0001
16/07/17 03:21:31 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1468696855031_0001/
16/07/17 03:21:31 INFO mapreduce.Job: Running job: job_1468696855031_0001
16/07/17 03:21:43 INFO mapreduce.Job: Job job_1468696855031_0001 running in uber mode : false
16/07/17 03:21:43 INFO mapreduce.Job: map 0% reduce 0%
Same problem, HDFS terminates
Your post looks incomplete to deduce what is wrong here. My guess is that hadoop-mapreduce-examples-2.7.2-sources.jar is not what you want. More likely you need hadoop-mapreduce-examples-2.7.2.jar containing .class files and not the sources.
HDFS has to be restarted the first time before MapReduce jobs can be successfully ran. This is because HDFS creates some data on the first run but stopping it can clean up its state so MapReduce jobs can be ran through YARN afterwards.
So my solution was:
Start Hadoop: $HADOOP_HOME/sbin/start-dfs.sh
Stop Hadoop: $HADOOP_HOME/sbin/stop-dfs.sh
Start Hadoop again: $HADOOP_HOME/sbin/start-dfs.sh

Hadoop program stuck at "Running job:"

I was running hadoop program (wordcount) in Horton sandbox. And the situation occurred as below. Especially, this is the program I had ran successfully for many times on exactly the same virtual machine I used, however this time it "failed" without any notification, so it just stuck there. I tried other mapreduce program, the results are similar. Normally, the command lines will notify me with ubermode : false, follows by the Running job..., but this time, it doesn't, and out of no reason.
[root#sandbox ~]# hadoop jar testWC.jar testWC.WCdriver /data/input/pg103.txt /data/output/WC
WARNING: Use "yarn jar" to launch YARN applications.
16/03/11 19:20:01 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
16/03/11 19:20:01 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
16/03/11 19:20:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/03/11 19:20:02 INFO input.FileInputFormat: Total input paths to process : 1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: number of splits:1
16/03/11 19:20:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457723341319_0002
16/03/11 19:20:03 INFO impl.YarnClientImpl: Submitted application application_1457723341319_0002
16/03/11 19:20:03 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1457723341319_0002/
16/03/11 19:20:03 INFO mapreduce.Job: Running job: job_1457723341319_0002
The program just could not move on anymore.

Why Mapreduce with YARN stuck on CDH 5.3?

Mapreduce with YARN fail to move ahead of 0% map and 0% reduce. I am using Cloudera CDH on google compute high memory instance(13 GM RAM). 8 GB free ram is available on the machine. Can you please help me to fix it?
sunny#hadoop-m:~$ hadoop jar /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hadoop-mapreduce-examples-2.5.0-cdh5.3.0.jar grep input output 'dfs[a-z.]+'
14/12/24 00:13:53 INFO client.RMProxy: Connecting to ResourceManager at hadoop-m.c.sunny-hadoop-trial.internal/10.240.253.233:8032
14/12/24 00:13:53 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/12/24 00:13:54 INFO input.FileInputFormat: Total input paths to process : 5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: number of splits:5
14/12/24 00:13:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1419360146634_0001
14/12/24 00:13:54 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/12/24 00:13:54 INFO impl.YarnClientImpl: Submitted application application_1419360146634_0001
14/12/24 00:13:55 INFO mapreduce.Job: The url to track the job: http://hadoop-m.c.sunny-hadoop-trial.internal:8088/proxy/application_1419360146634_0001/
14/12/24 00:13:55 INFO mapreduce.Job: Running job: job_1419360146634_0001
Resource Manager Output
Some more info about job
yarn-site.xml: http://pastebin.mozilla.org/8113782
mapred-site.xml: http://pastebin.mozilla.org/8113813
Server 's IP got changed because of DHCP service. Client configuration for HDFS and YARN became stale. I needed to update client configuration, I did it with Cloudera manager and now cluster is running fine.

Hadoop - Example MapReduce Application not running

I deployed Hadoop 2.2.0 in Ubuntu 12.04 LTS according this article: http://codesfusion.blogspot.com/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1
Everything is OK except when I try to run Hadoop example at last step, it's pause with message Job Running
13/11/24 23:36:30 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/11/24 23:36:30 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/11/24 23:36:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1385310900376_0001
13/11/24 23:36:32 INFO impl.YarnClientImpl: Submitted application application_1385310900376_0001 to ResourceManager at master/192.168.56.1:8040
13/11/24 23:36:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1385310900376_0001/
13/11/24 23:36:32 INFO mapreduce.Job: Running job: job_1385310900376_0001
In ResourceManager Web GUI, i see "App is Pending". So, how I can change to Running State?
Screenshot: http://farm8.staticflickr.com/7344/11031415055_d987e937aa_o.png
Thanks you! :)

Resources