Application status in hadoop cluster manager is undefined - hadoop

I am trying to run a HelloWorld programing in Pig or Hive on my single-cluster setup. All my applications hangs. That includes my word count example in pig. My Hive command below also hangs:
SELECT COUNT(*) FROM u_data;
My Hive logs stops as follows: set mapreduce.job.reduces=<number>
Starting Job = job_1433035285759_0006, Tracking URL = http://der-Inspiron-3521:8088/proxy/application_1433035285759_0006/
Kill Command = /home/der/utility/hadoop-2.7.0/bin/hadoop job -kill job_1433035285759_0006
I follow the link. The application shows the
status = undefined
Here's my map-reduce configuration file:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
<description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>
<name>mapreduce.jobtracker.http.address</name>
<value>localhost:50030</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
My hadoop server is version 2.7.0.
I tried hadoop server 2.6.0 but I had other issues.
My pig scripts hangs at the MapReduce Launching stage:
2015-05-30 19:34:59,641 [main] INFO org.apache.pig.backend.hadoop.executionengine.
mapReduceLayer.MapReduceLauncher -
detailed locations: M: lines[1,8],words[-1,-1],
wordcount[4,12],grouped[3,10] C: wordcount[4,12],grouped[3,10] R: wordcount[4,12]
2015-05-30 19:34:59,666 [main]
INFO org.apache.pig.backend.
hadoop.executionengine.mapReduceLayer
MapReduceLauncher - 0% complete
2015-05-30 19:34:59,666 [main] INFO org.apache.pig.backend.hadoop.executionengine.
mapReduceLayer.MapReduceLauncher -
Running jobs are [job_1433039621140_0001]

Related

Hadoop Wordcount example failing due to AM container

I've been trying to run the hadoop wordcount example for a while now, however I am facing some issues. I have hadoop 2.7.1 and running it on Windows. Below are the error details:
command:
yarn jar C:\hadoop-2.7.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.7.1.jar wordcount input output
Output:
INFO input.FileInputFormat: Total input paths to process : 1
INFO mapreduce.JobSubmitter: number of splits:1
INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
90853163147_0009
INFO impl.YarnClientImpl: Submitted application application_14
90853163147_0009
INFO mapreduce.Job: The url to track the job: http://*****
*****/proxy/application_1490853163147_0009/
INFO mapreduce.Job: Running job: job_1490853163147_0009
INFO mapreduce.Job: Job job_1490853163147_0009 running in uber
mode : false
INFO mapreduce.Job: map 0% reduce 0%
INFO mapreduce.Job: Job job_1490853163147_0009 failed with sta
te FAILED due to: Application application_1490853163147_0009 failed 2 times due
to AM Container for appattempt_1490853163147_0009_000002 exited with exitCode:
1639
For more detailed output, check application tracking page:http://********
:****/cluster/app/application_1490853163147_0009Then, click on links to logs of
each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1490853163147_0009_02_000001
Exit code: 1639
Exception message: Incorrect command line arguments.
Stack trace: ExitCodeException exitCode=1639: Incorrect command line arguments.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
at java.lang.Thread.run(Thread.java:745)
Shell output: Usage: task create [TASKNAME] [COMMAND_LINE] |
task isAlive [TASKNAME] |
task kill [TASKNAME]
task processList [TASKNAME]
Creates a new task jobobject with taskname
Checks if task jobobject is alive
Kills task jobobject
Prints to stdout a list of processes in the task
along with their resource usage. One process per line
and comma separated info per process
ProcessId,VirtualMemoryCommitted(bytes),
WorkingSetSize(bytes),CpuTime(Millisec,Kernel+User)
Container exited with a non-zero exit code 1639
Failing this attempt. Failing the application.
INFO mapreduce.Job: Counters: 0
Yarn-site.xml:
<configuration>
<property>
<name>yarn.application.classpath</name>
<value>
C:\hadoop-2.7.1\etc\hadoop,
C:\hadoop-2.7.1\share\hadoop\common\*,
C:\hadoop-2.7.1\share\hadoop\common\lib\*,
C:\hadoop-2.7.1\share\hadoop\hdfs\*,
C:\hadoop-2.7.1\share\hadoop\hdfs\lib\*,
C:\hadoop-2.7.1\share\hadoop\mapreduce\*,
C:\hadoop-2.7.1\share\hadoop\mapreduce\lib\*,
C:\hadoop-2.7.1\share\hadoop\yarn\*,
C:\hadoop-2.7.1\share\hadoop\yarn\lib\*
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
<value>98.5</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2200</value>
<description>Amount of physical memory, in MB, that can be allocated for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>500</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>259200</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>3600</value>
</property>
</configuration>
mapred.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Any idea on what is going wrong?
exitCode: 1639 Looks like your are running hadoop on Windows .
https://github.com/OctopusDeploy/Issues/issues/1346
I faced exactly same problem. I was following a guide on how to install Hadoop 2.6.0 (http://www.ics.uci.edu/~shantas/Install_Hadoop-2.6.0_on_Windows10.pdf) while actually installing Hadoop 2.8.0.
As soon as I was done I ran
hadoop jar D:\hadoop-2.8.0\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.8.0.jar wordcount /foo/bar/LICENSE.txt /out1
And got (from yarn nodemanager's logs):
17/06/19 13:15:30 INFO monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1497902417767_0004_01_000001
17/06/19 13:15:30 INFO nodemanager.DefaultContainerExecutor: launchContainer: [D:\hadoop-2.8.0\bin\winutils.exe, task, create, -m, -1, -c, -1, container_1497902417767_0004_01_000001, cmd /c D:/hadoop/temp/nm-localdir/usercache/******/appcache/application_1497902417767_0004/container_1497902417767_0004_01_000001/default_container_executor.cmd]
17/06/19 13:15:30 WARN nodemanager.DefaultContainerExecutor: Exit code from container container_1497902417767_0004_01_000001 is : 1639
17/06/19 13:15:30 WARN nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1497902417767_0004_01_000001 and exit code: 1639
ExitCodeException exitCode=1639: Incorrect command line arguments.
TaskExit: error (1639): Invalid command line argument. Consult the Windows Installer SDK for detailed command line help.
Another symptom was (from yarn nodemanager's logs):
17/06/19 13:25:49 WARN util.SysInfoWindows: Expected split length of sysInfo to be 11. Got 7
The solution was to get compatible (with Hadoop 2.8.0) binaries: https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.0-RC3/bin
Once I got a correct winutils.exe, my problem went away.

Configuring HCatalog, WebHCat with Hive

I'm installing Hadoop, Hive to be integrated with WebHCat which will be used to run hive queries through it using Map-Reduce jobs of Hadoop.
I installed Hadoop 2.4.1 and Hive 0.13.0 (latest stable versions).
The request I'm sending using the web interface is:
POST: http://localhost:50111/templeton/v1/hive?user.name='hadoop'&statusdir='out'&execute='show tables'
And I got response as the following:
{
"id": "job_local229830426_0001"
}
But in the logs webhcat-console-error.log I find that exit value of this job is 1, which means some error occurred. Tracking this error I found it Missing argument for option: hiveconf
This is the webhcat-site.xml which contains the configurations of webhcat (known previously as templeton):
<configuration>
<property>
<name>templeton.port</name>
<value>50111</value>
<description>The HTTP port for the main server.</description>
</property>
<property>
<name>templeton.hive.path</name>
<value>/usr/local/hive/bin/hive</value>
<description>The path to the Hive executable.</description>
</property>
<property>
<name>templeton.hive.properties</name>
<value>hive.metastore.local=false,hive.metastore.uris=thrift://localhost:9933,hive.metastore.sasl.enabled=false</value>
<description>Properties to set when running hive.</description>
</property>
</configuration>
But the cmd query executed is weird as it have some additional hiveconf parameters with no values:
tool.TrivialExecService: Starting cmd: [/usr/local/hive/bin/hive, --service, cli, --hiveconf, --hiveconf, --hiveconf, hive.metastore.local=false, --hiveconf, hive.metastore.uris=thrift://localhost:9933, --hiveconf, hive.metastore.sasl.enabled=false, -e, show tables]
Any Idea?

yarn hadoop 2.4.0: info message: ipc.Client Retrying connect to server

i've searched for two days for a solution. but nothing worked.
First, i'm new to the whole hadoop/yarn/hdfs topic and want to configure a small cluster.
the message above doesn't show up everytime i run an example from the mapreduce-examples.jar
sometimes teragen works, sometimes not.
in some cases the whole job failed, in others the job finishes successfully. sometimes the job failes, without printing the message above.
14/06/08 15:42:46 INFO ipc.Client: Retrying connect to server: FQDN-HOSTNAME/XXX.XX.XX.XXX:53022. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
this message is print 30 times. also the port (in code example: 53022) changes with every time a job is started.
if job finished succesfuly, this is print
14/06/08 15:34:20 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
14/06/08 15:34:20 INFO mapreduce.Job: Job job_1402234146062_0002 running in uber mode : false
14/06/08 15:34:20 INFO mapreduce.Job: map 100% reduce 100%
14/06/08 15:34:20 INFO mapreduce.Job: Job job_1402234146062_0002 completed successfully
if it fails,this is shown.
INFO mapreduce.Job: Job job_1402234146062_0005 failed with state FAILED due to: Task failed task_1402234146062_0005_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
in this case, some tasks failed. but in log files of nodemanager, datanode, resourcemanager, ... is no reason or message to find.
INFO mapreduce.Job: Task Id : attempt_1402234146062_0006_m_000002_1, Status : FAILED
Additional Information about my Configuration:
used OS: centOS 6.5
Java Version: OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.address</name>
<value>FQDN-HOSTNAME:8050</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>FQDN-HOSTNAME:8040</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>FQDN-HOSTNAME:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>FQDN-HOSTNAME:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>FQDN-HOSTNAME:8032</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions </name>
<value>false </value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///var/data/hadoop/hdfs/nn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
<name>fs.checkpoint.edits.dir</name>
<value>file:///var/data/hadoop/hdfs/snn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///var/data/hadoop/hdfs/dn</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>/mapred/tempDir</value>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>/mapred/localDir</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>FQDN-HOSTNAME:10020</value>
</property>
</configuration>
I hope somebody could help me. :)
Thank you,
Norman
The job finishes sometimes successfully because when you have one reducer and that reduce task by chance is sent to a working node manager then it becomes successful job.
You have to make sure that FQDN-HOSTNAME is written exactly the same way in the slaves file. If I remember correctly, my solution was that I removed the entry for the hostname mapping in /etc/hosts, that is commenting it out like this:
#127.0.0.1 FQDN-HOSTNAME
This is a bug in how the MR AppMaster starts up with ephemeral ports. It exists in Hadoop 2.6.0 release version as well.
I have figured out a fix to this bug and created a JIRA on the MAPREDUCE project along with a comment on how to fix it.
https://issues.apache.org/jira/browse/MAPREDUCE-6338
Another possible solution for this, is to check for the firewall in all the nodes.
If you're dealing with iptables, you can run this on every node:
# /etc/init.d/iptables save
# /etc/init.d/iptables stop
That will stop the firewall until next restart, but it should be enough for you to test the cluster. You don't have to restart yarn or anything, just run the job again.
If you want to completely stop the FW:
# chkconfig iptables off
Definitely a bug, this post provides a clearer insight into what is happening.
https://groups.google.com/a/cloudera.org/forum/#!msg/cdh-user/P1rfMQmYVWk/eARZXHUTkW0J
We are planning on getting around this issue by reducing the ephemeral port range, thus limiting what ports are grabbed, and then configuring iptables to allow for that port range. Setting the port ranges is explained here -
http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html
if you see a message like
INFO ipc.Client: Retrying connect to server: <hostname>/<ip>:<port>. Already tried 1 time(s); maxRetries=3
Need to check:
check your firewall between client and Node Manager
check yarn.app.mapreduce.am.job.client.port-range by default the he range is all possible ports
Wow! Are these answers for real?? Talking about FQDN when the job clearly completes...as long as firewall is disabled?? And the OP even put the detailed log messages / configuration.
The problem is that yarn.app.mapreduce.am.job.client.port-range is not being honored. I'm running into it also.
Firewall off...all is well (and I can see the ephemeral ports from yarn job).
Firewall on...all times outs (eventually).
Horton completely ignores this question on other boards.
So here's a log output from a job which demonstrates the problem. In first case, I have the firewall enabled on the client(s) based on Horton's doc (along with other ports I discovered by looking very closely at my installation). You will see the process timing out...and then all of a sudden working. Because I disabled the firewall after watching the job output :)
2015-01-15 16:48:22,943 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: de-luster-l2723nraqsy5-ywhniidze3lb-qfk4asn77vc5/10.0.0.41:52015. Already tried 39 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000 MILLISECONDS)
2015-01-15 16:48:23,349 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /hadoop/yarn/local/usercache/l.admin/appcache/application_1420482341308_0020
2015-01-15 16:48:24,122 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-01-15 16:48:24,656 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-01-15 16:48:24,724 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#7f94ee59
2015-01-15 16:48:24,792 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=534354336, maxSingleShuffleLimit=133588584, mergeThreshold=352673888, ioSortFactor=100, memToMemMergeOutputsThreshold=100
Did ya see it?? Problem with timeout...then all of a sudden Shuffle commences. Nothing to do with FQDNs after all :)

Hadoop can not start the thrift server, and Hue can't communicate with Hadoop namenode and datanode

I installed the hadoop CDH3u6 on 3 machines, but when I start the hadoop, I checked the namenode log, and find:
2014-06-22 13:58:39,535 WARN org.apache.hadoop.util.PluginDispatcher: Unable to load dfs.namenode.plugins plugins
So the hadoop thrift server can not start! and the Hue give an exception:
Exception communicating with HDFS Namenode HUE Plugin at x.x.x.x:50903: Could not connect to x.x.x.x:50903
My hadoop configs are as follows:
1. hdfs-site.xml
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plugins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plugins to be activated.
</description>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:50903</value>
</property>
Is the plugin jar installed? e.g. /usr/lib/hadoop/lib/hue-plugins-2.2.0-SNAPSHOT.jar
Could you list the hadoop packages?

Error running mapreduce sample in hadoop 0.23.6

I deployed Hadoop 0.23.6 in Ubuntu 12.04 LTS. I am able to copy files across and do file manipulation. I am using YARN for mapreduce.
I am getting the following error, when I am trying to run any mapreduce application using the hadoop-mapreduce-examples-0.23.6.jar
Command used:
bin/hadoop jar hadoop-mapreduce-examples-0.23.6.jar randomwriter -Dmapreduce.randomwriter.mapsperhost=1 -Dmapreduce.job.user.name=$USER -Dmapreduce.randomwriter.bytespermap=10000 -Ddfs.blocksize=536870912 -Ddfs.block.size=536870912 -libjars hadoop-mapreduce-client-app-0.23.6.jar output
Hadoop version: 0.23.6
Container launch failed for container_1364342550899_0001_01_000002 : java.lang.IllegalStateException: Invalid shuffle port number -1 returned for attempt_1364342550899_0001_m_000000_0
Verify your yarn-site.xml configuration. You need to have below properties configured.
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
For more details, have look at jira
https://issues.apache.org/jira/browse/MAPREDUCE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Resources