YARN container launch failed

YARN container launch failed - hadoop

I am unable to run queries on hive. Query fails just after launching map reduce operation (MAP 0% REDUCE 0%). Found the following error in nodemanager logs.
2017-03-16 11:53:03,581 ERROR [ContainerLauncher #0] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1489041811986_0005_01_000002 : java.lang.IllegalArgumentException: Does not contain a valid host:port authority: slave_1:60805
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:213)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:258)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:409)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:375)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I guess it is not able to map hostname slave_1 to its ip.
Any help will be appreciated.
Thanks.

I have got the same error and solved it for several days with with the following step:
open the file /etc/hosts;
Since your error message is "Does not contain a valid host:port
authority: slave_1:60805", there should be a value as "salve_1" in
file "/etc/hosts", for example: "127.0.0.1 salve_1" or "127.0.1.1
salve_1";
you need to remove the character "_" or "-" for this hostname and
then try again. in your example, you can change it to "slave1";
In my case, I removed "-" character in the hostname and then it worked.
Hope that it works for you.

Related

failed to launch apache.spark.master

Whenever i run start-master.sh command on my local machine i am getting following error please someone help me to fix this issue
Terminal Error
Error which i get in terminal
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-2.0.1-bin-hadoop2.6/logs/spark-andani-org.apache.spark.deploy.master.Master-1-andani.sakha.com.out
failed to launch org.apache.spark.deploy.master.Master:
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
Log Error
If i check the spark log file following is the error
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkMaster' failed after 16 retries! Consider explicitly setting the appropriate port for the service 'sparkMaster' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)

The error is due to your sparkMaster was not able to contact to your internal IP
Once check your /etc/hosts file weather it pointing to proper host name or your previous IP address might have changed.
Reconfigure it and run the command once again.

worker is getting restarted continuously with closedchannel exception in supervisor

worker which is in one of the supervisor is getting restarted continuously and getting Closedchannel exception . But if run the same topology in another storm cluster which is in another environment , it is running without giving any errors.
Below is the error i can see from Storm UI.
java.lang.RuntimeException: java.nio.channels.ClosedChannelException at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:103) at org.apache.storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) at org.apache.storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:129) at org.apache.storm.daemon.executor$fn__7990$fn__8005$fn__8036.invoke(executor.clj:648) at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68) at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:127) at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:79) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:75) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:65) at org.apache.storm.kafka.PartitionManager.(PartitionManager.java:94) at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ... 6 mo
Can any one please help me to find out the exact issue.Please let me know if need any more information.

I faced this issue and problem was that ZooKeeper host names not being resolved from worker host.

Hbase shell gives NativeException: java.lang.ExceptionInInitializerError

I have configure hbase on my local machine, below are my jsp task
$ jps
17389 HQuorumPeer
16554 TaskTracker
17894 Jps
16362 JobTracker
15786 NameNode
16078 DataNode
16267 SecondaryNameNode
But when I hit
$ hbase shell
It gives me following error
NativeException: java.lang.ExceptionInInitializerError:
java.lang.reflect.InvocationTargetException
initialize at /home/rahul/hbase-1.2.4/lib/ruby/hbase/hbase.rb:42
(root) at /home/rahul/hbase-1.2.4/bin/hirb.rb:131
Can any one help me to solve this error.I have wasted several hours to solve this error. Help is really appreciated.

Unfortunately this error is very generic and can occur for a number of reasons. I recently experienced this using the hbase command on version HBase 1.2.0-cdh5.16.1 when the wrong URI was configured in core-site.xml and hbase-site.xml (fs.defaultFS and hbase.rootdir respectively). The only way I diagnosed this was to try connecting programmatically via the Java API (e.g. by following https://www.baeldung.com/hbase), which gave me the full stack trace of the exception that caused the NativeException.

Storm-YARN : Application container fails to launch

I am running a storm (trident) topology that reads avro from kafka & writes the records in hbase.
The topology is running as expected in Localcluster mode, but while using Stormsubmitter I'm facing below issues.
In Distributed Hadoop mode I'm getting the below error [1] while launching the YARN application.
In Hadoop (local mode, with 1 box only) Yarn is spawnning the nimbus server and storm-ui. But there are no supervisor(s) running to run the spout/bolts in the topology. I guess the reason might be insufficient memory (4G to run the topology + hbase, hdfs, kafka, zookeeper etc...).
Can you help me out in understanding the reason of this container failure? There are no errors/info present in application logs.
[1] YARN container fails to launch with below error on running.
storm-yarn launch /homext/storm-yarn.yml --queue default -appname storm-yarn-demo --stormZip /tmp/storm-0.9.zip
Application application_1415038356032_0304 failed 2 times due to AM Container for appattempt_1415038356032_0304_000002 exited with exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 127
.Failing this attempt.. Failing the application.

This log is insufficient to diagnose. All it says is that the container failed to launch. You should look into the container output. Check the ${yarn.nodemanager.log-dirs} on the nodes, there will be an application folder (application_1415038356032_0304) and in there there will be a container folder for each attempt (...1415038356032_0304_000002) containing the stderr, stdout and syslog of this attempt. Read those and you'll likely identify the problem.
If these don't exist, look in ${yarn.nodemanager.local-dirs} you'll find the container launch script (I thinks is called container-launch.sh) for this app/container attempt. In it will be the actual command to launch the container. Try to run that from the shell prompt and see what you get.

If it fails at an early stage then the logs can be found in HDFS under:
/tmp/logs/<user>/logs/
This should give enough information to diagnose the problem.
In my case I found a log file:
/tmp/logs/hdfs/logs/application_1426618997634_0004/vagrant-cdh-node4_8041
With some errors like:
/bin/bash: /usr/lib/jvm/java-7-oracle/bin/java: No such file or directory
And fixing the JAVA_HOME environment variable did the trick.

Getting error when parsing spark driver host in hadoop

I am trying to run Spark-1.0.1 against Apache Hadoop 2.2.0 YARN cluster. Both are deployed on my Windows 7 machine. When I am trying to run JavaSparkPI sample I am getting parsing exception on Hadoop side. On the Spark side, all the parameters looks ok and there is no extra character after port's 5 digits. Can anybody help please...
Exception in thread "main" java.lang.NumberFormatException: For input string: "57831'"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
at org.apache.spark.util.Utils$.parseHostPort(Utils.scala:544)
at org.apache.spark.deploy.yarn.ExecutorLauncher.waitForSparkMaster(ExecutorLauncher.scala:163)
at org.apache.spark.deploy.yarn.ExecutorLauncher.run(ExecutorLauncher.scala:101)
at org.apache.spark.deploy.yarn.ExecutorLauncher$$anonfun$main$1.apply$mcV$sp(ExecutorLauncher.scala:263)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ExecutorLauncher.scala:262)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ExecutorLauncher.scala)
14/08/11 09:00:38 INFO yarn.Client: Command for starting the Spark ApplicationMaster:
List(%JAVA_HOME%/bin/java, -server, -Xmx512m, -Djava.io.tmpdir=%PWD%/tmp,
-Dspark.tachyonStore.folderName=\"spark-80c61976-f671-41b9-96a0-0c7c5c317fdb\",
-Dspark.yarn.secondary.jars=\"\",
-Dspark.driver.host=\"W01B62GR.UBSPROD.MSAD.UBS.NET\",
-Dspark.app.name=\"JavaSparkPi\",
-Dspark.jars=\"file:/N:/Nick/Spark/spark-1.0.1-bin-hadoop2/bin/../lib/spark-examples-1.0.1-hadoop2.2.0.jar\",
-Dspark.fileserver.uri=\"http://139.149.169.172:57836\",
-Dspark.executor.extraClassPath=\"N:\Nick\Spark\spark-1.0.1-bin-hadoop2\lib\spark-examples-1.0.1-hadoop2.2.0.jar\",
-Dspark.master=\"yarn-client\", -Dspark.driver.port=\"57831\",
-Dspark.driver.extraClassPath=\"N:\Nick\Spark\spark-1.0.1-bin-hadoop2\lib\spark-examples-1.0.1-hadoop2.2.0.jar\",
-Dspark.httpBroadcast.uri=\"http://139.149.169.172:57835\",
-Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar , null,
--args 'W01B62GR.UBSPROD.MSAD.UBS.NET:57831' ,
--executor-memory, 1024, --executor-cores, 1,
--num-executors , 2, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)

The error looks pretty clear there: 57831' is not a number. 57831 is. Look at your argument:
'W01B62GR.UBSPROD.MSAD.UBS.NET:57831'
The ' should not be there. If you mean those weren't in your original argument, show your command line. I am not sure this will work on Windows without Cygwin.

i got same problem when running spark pi . The issue is that you are not passing the arguments in correct order. Ensure you list all the options (like --master) before and then specify the job you are running .
http://spark.apache.org/docs/latest/running-on-yarn.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

YARN container launch failed - hadoop

Related

failed to launch apache.spark.master

worker is getting restarted continuously with closedchannel exception in supervisor

Hbase shell gives NativeException: java.lang.ExceptionInInitializerError

Storm-YARN : Application container fails to launch

Getting error when parsing spark driver host in hadoop

Categories

Resources