Nutch job failed to post to Solr - hadoop

Environment: NUTCH 1.6 & SOLR 4.2.1.
Nutch & SOLR are running successfully. Posting to SOLR works well. But when trying to post from NUTCH, I am getting "Job failed".
Following command is used to run Nutch crawling and posting.
bin/nutch crawl urls -solr http://10.10.10.10:8983/solr/tsdcr -dir crawl -depth 5 -topN 5 -threads 50
Output:
SolrIndexer: finished at 2015-03-27 03:04:29, elapsed: 00:01:46
SolrDeleteDuplicates: starting at 2015-03-27 03:04:29
SolrDeleteDuplicates: Solr url: http://10.10.10.10:8983/solr/tsdcr
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Hadoop log:
2015-03-26 03:04:00,832 DEBUG httpclient.HttpMethodBase - Resorting to protocol version default close connection policy
2015-03-26 03:04:00,832 DEBUG httpclient.HttpMethodBase - Should NOT close connection, using HTTP/1.1
2015-03-26 03:04:00,832 TRACE httpclient.HttpConnection - enter HttpConnection.isResponseAvailable()
2015-03-26 03:04:00,832 TRACE httpclient.HttpConnection - enter HttpConnection.releaseConnection()
2015-03-26 03:04:00,832 DEBUG httpclient.HttpConnection - Releasing connection back to connection manager.
2015-03-26 03:04:00,854 WARN mapred.FileOutputCommitter - Output path is null in cleanup
2015-03-26 03:04:00,854 WARN mapred.LocalJobRunner - job_local_0017
java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:388)
at org.apache.hadoop.io.Text.set(Text.java:178)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Any help is much appreciated.
Thank you.

Related

Apache Nifi not starting on Windows 7 - because "extensionMapping" is null

I am just a beginner and trying to install NiFi for first time. I am having problem getting it working.
I have Windows 7 32 bit. I have installed LibericaJDK-19 just to get Nifi working. Else it was not working with java 8 as well for me.
I am trying nifi-1.19.1. conf file is default. I have set JAVA_HOME to point to new JDK - 19.
Last lines of nifi-bootstrap.log
2023-01-18 21:51:13,373 INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Bootstrap requests on port 51200
2023-01-18 21:53:44,866 ERROR [NiFi logging handler] org.apache.nifi.StdErr Failed to start web server: Cannot invoke "org.apache.nifi.nar.ExtensionMapping.size()" because "extensionMapping" is null
2023-01-18 21:53:44,866 ERROR [NiFi logging handler] org.apache.nifi.StdErr Shutting down...
2023-01-18 21:53:45,515 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi never started. Will not restart NiFi
At start of nifi-app.log 
I get the warning
WARN [main] org.apache.nifi.nar.NarUnpacker Unable to load NAR library bundles due to java.util.zip.ZipException: zip END header not found Will proceed without loading any further Nar bundles
java.util.zip.ZipException: zip END header not found
at java.base/java.util.zip.ZipFile$Source.findEND(ZipFile.java:1483)
at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1491)
at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1329)
at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1292)
at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:710)
at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:243)
at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:172)
at java.base/java.util.jar.JarFile.<init>(JarFile.java:345)
at java.base/java.util.jar.JarFile.<init>(JarFile.java:316)
at java.base/java.util.jar.JarFile.<init>(JarFile.java:282)
at org.apache.nifi.nar.NarUnpacker.determineDocumentedNiFiComponents(NarUnpacker.java:605)
at org.apache.nifi.nar.NarUnpacker.unpackDocumentation(NarUnpacker.java:550)
at org.apache.nifi.nar.NarUnpacker.unpackBundleDocs(NarUnpacker.java:287)
at org.apache.nifi.nar.NarUnpacker.mapExtensions(NarUnpacker.java:271)
at org.apache.nifi.nar.NarUnpacker.unpackNars(NarUnpacker.java:220)
at org.apache.nifi.nar.NarUnpacker.unpackNars(NarUnpacker.java:89)
at org.apache.nifi.nar.NarUnpacker.unpackNars(NarUnpacker.java:83)
at org.apache.nifi.nar.NarUnpacker.unpackNars(NarUnpacker.java:74)
at org.apache.nifi.NiFi.<init>(NiFi.java:142)
at org.apache.nifi.NiFi.<init>(NiFi.java:83)
at org.apache.nifi.NiFi.main(NiFi.java:332)
and at the end of nifi-app.log
2023-01-18 21:53:44,866 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down.
java.lang.NullPointerException: Cannot invoke "org.apache.nifi.nar.ExtensionMapping.size()" because "extensionMapping" is null
at org.apache.nifi.documentation.DocGenerator.generate(DocGenerator.java:61)
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:788)
at org.apache.nifi.NiFi.<init>(NiFi.java:172)
at org.apache.nifi.NiFi.<init>(NiFi.java:83)
at org.apache.nifi.NiFi.main(NiFi.java:332)
2023-01-18 21:53:44,889 INFO [Thread-0] org.apache.nifi.NiFi Application Server shutdown started
2023-01-18 21:53:44,890 INFO [Thread-0] org.apache.nifi.NiFi Application Server shutdown completed

I can't start Apache Nifi

When I run run-nifi.bat it pops up for a split second but then auto closes. I don't really understand this, I just need it for a university class and it hadn't been properly explained, so I'm just trying on my own really.
I get this in my nifi-app.log:
2021-05-29 17:07:30,179 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2021-05-29 17:07:30,450 INFO [main] org.apache.nifi.security.kms.CryptoUtils Determined default nifi.properties path to be 'D:\SYSTEM\Downloads\nifi-1.13.2-bin\nifi-1.13.2\.\conf\nifi.properties'
2021-05-29 17:07:30,454 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 188 properties from D:\SYSTEM\Downloads\nifi-1.13.2-bin\nifi-1.13.2\.\conf\nifi.properties
2021-05-29 17:07:30,465 INFO [main] org.apache.nifi.NiFi Loaded 188 properties
2021-05-29 17:07:30,705 INFO [main] org.apache.nifi.BootstrapListener Started Bootstrap Listener, Listening for incoming requests on port 63487
2021-05-29 17:07:30,711 ERROR [main] org.apache.nifi.NiFi Failure to launch NiFi due to java.net.ConnectException: Connection refused: connect
java.net.ConnectException: Connection refused: connect
at java.base/sun.nio.ch.Net.connect0(Native Method)
at java.base/sun.nio.ch.Net.connect(Net.java:576)
at java.base/sun.nio.ch.Net.connect(Net.java:565)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:588)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333)
at java.base/java.net.Socket.connect(Socket.java:645)
at java.base/java.net.Socket.connect(Socket.java:595)
at org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:102)
at org.apache.nifi.BootstrapListener.start(BootstrapListener.java:74)
at org.apache.nifi.NiFi.<init>(NiFi.java:102)
at org.apache.nifi.NiFi.<init>(NiFi.java:71)
at org.apache.nifi.NiFi.main(NiFi.java:303)
2021-05-29 17:07:30,712 INFO [Thread-0] org.apache.nifi.NiFi Initiating shutdown of Jetty web server...
2021-05-29 17:07:30,712 INFO [Thread-0] org.apache.nifi.NiFi Jetty web server shutdown completed (nicely or otherwise).
I've tried editing the web properties in the config files in case the default was wrong. Right now it's on, but the errors are the same:
nifi.web.http.host=localhost
nifi.web.http.port=9090
nifi.web.http.network.interface.default=
I have Windows 10 Home Edition.
NiFi requires Java 8 or Java 11 to run. So your environment variables should point to the correct directory with Java 8 or Java 11.
Have you tried setting the JAVA_HOME environment variable? I would recommend checking the config files and telling the configs where to find the Java installation
You might be missing URL ACL
Maybe you can try below command:
netsh http add urlacl url=http://computername:port/ user=username
Source: https://serverfault.com/a/246798/191420

Error sending data to APM even after successful connectivity

Able to establish APM connection
2021-03-09 17:45:05,741 [Attach Listener] INFO co.elastic.apm.agent.configuration.StartupInfo - VM Arguments: [-XX:TieredStopAtLevel=1, -Xmx6g, -Dfile.encoding=UTF-8, -Duser.country=IN, -Duser.language=en, -Duser.variant]
2021-03-09 17:45:08,192 [Attach Listener] INFO co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state
2021-03-09 17:45:08,734 [elastic-apm-server-healthcheck] INFO co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server is available: { "build_date": "2021-02-15T12:37:48Z", "build_sha": "e77061bb3aaedae5ae8dd0ca193eb662513aedde", "version": "7.11.0"}
But post connection, it still throws this error. What could be wrong here, appreciate any inputs on this
2021-03-09 17:45:53,484 [elastic-apm-server-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 0 seconds (+/-10%)
2021-03-09 17:45:53,489 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Read timed out, response code is -1
2021-03-09 17:45:53,489 [elastic-apm-server-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - null
2021-03-09 17:46:08,890 [elastic-apm-server-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 1 seconds (+/-10%)
2021-03-09 17:46:09,922 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error sending data to APM server: Read timed out, response code is -1
2021-03-09 17:46:09,922 [elastic-apm-server-reporter] WARN co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - null
Check your kibana.yml file URLs, when I setup APM I my machine some of my URLs (elasticsearch.hosts, xpack.fleet.outputs) were defaulted to my current IP address (instead of localhost), which changed after a reboot.

Storm worker not starting

I am trying to storm a storm topology but the storm worker refuses to start when I try to run the java command which invokes the worker process I get the following error:
Exception: java.lang.StackOverflowError thrown from the UncaughtExceptionHandler in thread "main"
I am not able to find what problem is causing this. Has anyone faced similar issue
Edit:
when I runt the worker process with flag -V I get the following error:
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.io.tmpdir=/tmp
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.compiler=<NA>
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.name=Linux
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.arch=amd64
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.version=3.5.0-23-generic
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.name=storm
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.home=/home/storm
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.dir=/home/storm/storm-0.9.0.1
797 [main] ERROR org.apache.zookeeper.server.NIOServerCnxn - Thread Thread[main,5,main] died
PS: When I run the same topology in local cluster it works fine, only when i deploy in cluster mode it doesnt start.
Just found out the issue. The jar I creted to upload in the storm cluster, was kept in the storm base directory pics. This somehow was creating conflict which was not shown in the log file and actually log file never got created.
Make sure no external jars are present in the base storm folder from where one start storm. Really tricky error no idea why this happens until you just get around it.
Hope the storm guys add this into the logs so that user facing such issue can pinpoint why exactly this is happening.

While running a topology in storm we are getting error like this

While running a topology in storm we are getting error like this,
8983 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl -
Starting
9144 [main] INFO **backtype.storm.daemon.nimbus** - Shutting down master
9199 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state upd
ate: :connected:none
9241 [main] INFO backtype.storm.daemon.nimbus - Shut down master
9273 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl -
Starting
9306 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0003, likely client has closed socket
9354 [main] INFO backtype.storm.daemon.supervisor - Shutting down c094c3b1-a378
-4c4f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca
9358 [main] INFO backtype.storm.daemon.supervisor - Shut down c094c3b1-a378-4c4
f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca
9361 [main] INFO **backtype.storm.daemon.superviso**r - Shutting down supervisor c0
94c3b1-a378-4c4f-af35-9278647c217a
9364 [Thread-5] INFO **backtype.storm.event** - Event manager interrupted
9369 [Thread-6] INFO backtype.storm.event - Event manager interrupted
9425 [main] INFO **backtype.storm.daemon.supervisor** - Shutting down supervisor 38
6d8d71-c9b5-4b51-bd6e-f9f605034ea0
9428 [Thread-8] INFO backtype.storm.event - Event manager interrupted
9429 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0007, likely client has closed socket
9429 [Thread-9] INFO backtype.storm.event - Event manager interrupted
9473 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0009, likely client has closed socket
9476 [main] INFO backtype.storm.testing - Shutting down in process zookeeper
9503 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - Ignoring exception
**java.nio.channels.ClosedChannelException**: null
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.jav
a:211) ~[na:1.7.0_03]
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.j
ava:242) ~[zookeeper-3.3.3.jar:3.3.3-1073969]
9510 [main] INFO **backtype.storm.testing** - Done shutting down in process zookeep
er
9513 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\c9b1bc1a-a950-4098-af77-f81a4d2b112f
9520 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71
9527 [main] INFO backtype.storm.testing - Unable to delete file: C:\Users\sowmi
ya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71\version-2\log.1
9529 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\fa7b3c9b-ac93-4090-b9e2-63f10019e61f
9543 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\55f1fd11-508e-43bb-b340-0d9b79f3af33
9579 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.Connection
StateManager - State change: SUSPENDED
9580 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.Connec
tionStateManager - There are no ConnectionStateListeners registered.
9583 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disco
nnected::none: with disconnected Zookeeper.
11232 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx
n - Session 0x143af55728d000b for server null, unexpected error, closing socket
connection and attempting reconnect
**java.net.ConnectException: Connection refused: no further information**
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0
3]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701
) ~[na:1.7.0_03]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
~[zookeeper-3.3.3.jar:3.3.3-1073969]
13992 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx
n - Session 0x143af55728d000b for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0
3]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701
) ~[na:1.7.0_03]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
Whwn we are trying to run the topology jar file all the operation like nimbus,zookeeper and supervisor process going to dead.please help us to know why this is happened.
Please help us to rectify this error and help to proceed further.
Thank you,
Sowmiya
Priya
This looks like a zookeeper issue. It looks like your processes are not being able to connect to zookeeper. Can't say more without more information.

Resources