Facing Runtime Exception at Nutch Injector - maven

I am trying to crawl website using Apache Nutch but getting following exception. Tried both in NetBeans and Intellj IDE....Your suggestion would be of great help on this issue...
Exception in thread "main" java.lang.RuntimeException: Injector job did not succeed, job status: FAILED, reason: NA
``` at org.apache.nutch.crawl.Injector.inject(Injector.java:365)
``` at com.yegor256.nutch.MainTest.main(MainTest.java:82)
Process finished with exit code 1

Related

Container exited with a non-zero exit code 1 during wordcount

When I am executing the wordcount program in hadoop-mapreduce-examples using below command
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /wordcount/input/test_input.txt /wordcount/output
It is throwing me following exception
Exception from container-launch.
Countainer id: countainer_1540539176003_003_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode 1;
at org.apache.hadoop.util.Sgell.runCommand(Shell.java:972)
at org.apache.hadoop.util.Sgell.run(Shell.java:869)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.javaL1170)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExcutor.java:235)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurreunt.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurreunt.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
How to fix it?
Sorry I'm new here.
Does it mean there is some memory problems?
You need to start by getting the correct logs.
Look at the url to track the job for the address to the YARN UI.
If that address is not available, you can copy the full app id to the logs command
yarn logs -applicationId application_1540...
From there, you can search for a stacktrace generated by the code.
If you've just setup Hadoop, I would guess that hdfs dfs -ls /wordcount_input/ throws some error about not existing or about permission denied

worker is getting restarted continuously with closedchannel exception in supervisor

worker which is in one of the supervisor is getting restarted continuously and getting Closedchannel exception . But if run the same topology in another storm cluster which is in another environment , it is running without giving any errors.
Below is the error i can see from Storm UI.
java.lang.RuntimeException: java.nio.channels.ClosedChannelException at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:103) at org.apache.storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) at org.apache.storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:129) at org.apache.storm.daemon.executor$fn__7990$fn__8005$fn__8036.invoke(executor.clj:648) at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:78) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:68) at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:127) at kafka.javaapi.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:79) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:75) at org.apache.storm.kafka.KafkaUtils.getOffset(KafkaUtils.java:65) at org.apache.storm.kafka.PartitionManager.(PartitionManager.java:94) at org.apache.storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ... 6 mo
Can any one please help me to find out the exact issue.Please let me know if need any more information.
I faced this issue and problem was that ZooKeeper host names not being resolved from worker host.

Remotely connect to spark on yarn cluster in client mode

I have a remote spark on yarn cluster that if I use rstudio server(web version) hosted on that cluster to connect in client mode I can do the following:
sc <- SparkR::sparkR.init(master = "yarn-client")
However if I try to use rstudio on my local machine to connect to that spark cluster the same way then I have errors:
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master
...
ERROR Utils: Uncaught exception in thread nioEventLoopGroup-2-2
java.lang.NullPointerException
...
ERROR RBackendHandler: createSparkContext on org.apache.spark.api.r.RRDD failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
A more detailed error message on hadoop application tracking page is like this:
User: blueivy
Name: SparkR
Application Type: SPARK
Application Tags:
State: FAILED
FinalStatus: FAILED
Started: 27-Oct-2015 11:07:09
Elapsed: 4mins, 39sec
Tracking URL: History
Diagnostics:
Application application_1445628650748_0027 failed 2 times due to AM Container for appattempt_1445628650748_0027_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://master:8088/proxy/application_1445628650748_0027/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1445628650748_0027_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:618)
at java.lang.Thread.run(Thread.java:785)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
I have the same configurations and environment for hadoop and spark with remote cluster: spark 1.5.1, hadoop 2.6.0 and ubuntu 14.04. Anyone can help me find what's my mistake here?

issue Running Spark Job on Yarn Cluster

I want to run my spark Job in Hadoop YARN cluster mode, and I am using the following command:
spark-submit --master yarn-cluster
--driver-memory 1g
--executor-memory 1g
--executor-cores 1
--class com.dc.analysis.jobs.AggregationJob
sparkanalitic.jar param1 param2 param3
I am getting error below, kindly suggest whats going wrong, is the command correct or not. I am using CDH 5.3.1.
Diagnostics: Application application_1424284032717_0066 failed 2 times due
to AM Container for appattempt_1424284032717_0066_000002 exited with
exitCode: 15 due to: Exception from container-launch.
Container id: container_1424284032717_0066_02_000001
Exit code: 15
Stack trace: ExitCodeException exitCode=15:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:197)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hdfs
start time: 1424699723648
final status: FAILED
tracking URL: http://myhostname:8088/cluster/app/application_1424284032717_0066
user: hdfs
2015-02-23 19:26:04 DEBUG Client - stopping client from cache: org.apache.hadoop.ipc.Client#4085f1ac
2015-02-23 19:26:04 DEBUG Utils - Shutdown hook called
2015-02-23 19:26:05 DEBUG Utils - Shutdown hook called
Any help would be greatly appreciated.
It can mean a lot of things, for us, we get the similar error message because of unsupported Java class version, and we fixed the problem by deleting the referenced Java class in our project.
Use this command to see the detailed error message:
yarn logs -applicationId application_1424284032717_0066
You should remove ".setMaster("local")" in the code.
The command looks correct.
What I've come across is that the "exit code 15" normally indicates a TableNotFound Exception. That usually means there's an error in the code you're submitting.
You can check this by visiting the tracking URL.
For me exit code issue solved by placing hive-site.xml in spark/conf directory.
Remove the line "spark.master":"local[*]" in the spark configuration file if you are running the spark jobs under cluster.
Suppose run on the local pc, include it.
Mani

Submit Job in Spark using Yarn Cluster

I am unable to submit the job in yarn cluster.The job is running fine under yarn-client option. When submit it to yarn-cluster only this log is coming multiple times.
Application report for application_1421828570504_0002 (state: ACCEPTED)
and got failed with the following exception.
diagnostics: Application application_1421828570504_0002 failed 10 times due to AM Container for app
attempt_1421828570504_0002_000010 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
You should have a look at the logs of your application:
> yarn logs --applicationId application_1421828570504_0002
This will yield some debug information of the actual run within the spark containers.
Since it is running locally but not on the cluster my wild guess would be a missing SparkContext definition. Have a look at my answer to this question for a fix.

Resources