Nutch 2.3.1 on cassandra couldn't start - hadoop

I'm trying to run nutch 2.3.1 with cassandra. Followed steps on http://wiki.apache.org/nutch/Nutch2Cassandra . Finally, when I try to start nutch with command:
bin/crawl urls/ test http://localhost:8983/solr/ 2
I got the following exception:
GeneratorJob: starting
GeneratorJob: filtering: false
GeneratorJob: normalizing: false
GeneratorJob: topN: 50000
GeneratorJob: java.lang.RuntimeException: job failed: name=[test]generate: 1454483370-31180, jobid=job_local1380148534_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:256)
at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:322)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:330)
Error running:
/home/user/apache-nutch-2.3.1/runtime/local/bin/nutch generate -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -topN 50000 -noNorm -noFilter -adddays 0 - crawlId webmd -batchId 1454483370-31180
Failed with exit value 255.
When I check logs/hadoop.log, here's the error message:
2016-02-03 15:18:14,741 ERROR connection.HConnectionManager - Could not start connection pool for host localhost(127.0.0.1):9160
...
2016-02-03 15:18:15,185 ERROR store.CassandraStore - All host pools marked down. Retry burden pushed out to client.
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
But my cassandra server is up:
runtime/local$ netstat -l |grep 9160
tcp 0 0 172.16.230.130:9160 *:* LISTEN
Anyone can help on this issue? Thanks.

The address of Cassandra is not localhost, it's 172.16.230.130. That is the reason, Nutch cannot connect to the Cassandra store.
Hope this helps,
Le Quoc Do

Related

Expected hostname at index 7 for neo4j bolt (3.5.21)

We run neo4j (3.5.21) in an EC2 instance. Today, after I restarted the server, noticed this error:
Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start
Service start logs:
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /etc/neo4j
logs: /var/log/neo4j
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 22577). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j/neo4j.log for current status.
This is what I see in neo4j.log:
2022-12-03 20:29:49.886+0000 INFO Bolt enabled on 0.0.0.0:7687.
2022-12-03 20:29:51.968+0000 INFO Started.
2022-12-03 20:29:52.121+0000 INFO Stopping...
2022-12-03 20:29:52.231+0000 INFO Stopped.
2022-12-03 20:29:52.233+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: org.neo4j.graphdb.config.InvalidSettingException: Unable to construct bolt discoverable URI using '' as hostname: Expected hostname at index 7: bolt://:7687
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:133)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.lambda$addBoltConnectorFromConfig$1(DiscoverableURIs.java:155)
at java.util.Optional.ifPresent(Optional.java:159)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.addBoltConnectorFromConfig(DiscoverableURIs.java:145)
at org.neo4j.server.rest.discovery.CommunityDiscoverableURIs.communityDiscoverableURIs(CommunityDiscoverableURIs.java:38)
at org.neo4j.server.CommunityNeoServer.lambda$createDBMSModule$0(CommunityNeoServer.java:99)
at org.neo4j.server.modules.DBMSModule.start(DBMSModule.java:59)
at org.neo4j.server.AbstractNeoServer.startModules(AbstractNeoServer.java:249)
at org.neo4j.server.AbstractNeoServer.access$700(AbstractNeoServer.java:102)
at org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter.start(AbstractNeoServer.java:541)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: java.net.URISyntaxException: Expected hostname at index 7: bolt://:7687
at java.net.URI$Parser.fail(URI.java:2847)
at java.net.URI$Parser.failExpecting(URI.java:2853)
at java.net.URI$Parser.parseHostname(URI.java:3389)
at java.net.URI$Parser.parseServer(URI.java:3235)
at java.net.URI$Parser.parseAuthority(URI.java:3154)
at java.net.URI$Parser.parseHierarchical(URI.java:3096)
at java.net.URI$Parser.parse(URI.java:3052)
at java.net.URI.<init>(URI.java:673)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:128)
... 15 more
2022-12-03 20:29:52.243+0000 INFO Neo4j Server shutdown initiated by request
EC2: t3.large
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1092-aws x86_64)
I have already tried restarting the server, restarting the service multiple times without any success. We have not changed anything on the networking (vpc, subnet, security groups, network interface, etc)
Curious if there's a config I am missing. Any help will be much appreciated.

java.net.ConnectException error when running yarn

I'm having an error when running yarn on a job. HDFS and Yarn both start up fine, jps shows everything normal, pseudo-distributed mode on HDFS works perfectly, and I have triple and quadruple checked my configuration files. Whenever I attempt to run Yarn, however, this happens:
INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From serverA/IPaddress to serverB:30170 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 6 failover attempts. Trying to failover after sleeping for 44428ms.
Yarn then attempts to connect over and over again until I forcefully quit the process. Any ideas why this is happening?
Can you see yarn web ui?
How did you start hdfs and yarn?
You can try ./sbin/start-all.sh

Impala Daemon Not Starting in cloudera-quickstart-vm-5.10

Impala not working in cloudera-quickstart-vm-5.10 as Impala Daemon is not starting.
Following is the status of the three required roles for Impala:
Impala Daemon is down
Impala Catalog Server has bad health &
Impala StateStore has good health
Getting following error messages:
Jul 22, 9:57:33.262 AM ERROR impala-server.cc:252
Could not read the root directory at hdfs://quickstart.cloudera:8020. Error was: Call From quickstart.cloudera/127.0.0.1 to quickstart.cloudera:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Jul 22, 9:57:33.262 AM ERROR impala-server.cc:255
Aborting Impala Server startup due to improper configuration. Impalad exiting.
Tried restarting the service through command line.
[cloudera#quickstart ~]$ sudo service impala-server restart
Stopped Impala Server: [ OK ]
Started Impala Server (impalad): [ OK ]
[cloudera#quickstart ~]$ service impala-server status
Impala Server is dead and pid file exists [FAILED]
Still it is not working.
Not sure if I can do some configuration changes for it or not. If so, could not find what exactly can be changed for it to work.

Ambari can't start namenode with a connection exception

I have been strapped in this for a few days,I tried a lot of methods on the Internet but it didn't work,so I register on stackoverflow and write my first question.
My environment is HDP2.4 and Ubuntu14.04 LTS
the log information is like this:
safemode: Call From master/9.119.131.105 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2016-04-21 15:59:57,796 - Retrying after 10 seconds. Reason: Execution of '/usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://master:8020 -safemode get | grep 'Safe mode is OFF'' returned 1. safemode: Call From master/9.119.131.105 to master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
(1).At first I tried to avoid the safemode,but it isn't the key of this issue.
(2).Then I started to work on the 8020 port of master,master is the namenode of my cluster and I am sure that the hosts file is correct:
9.119.131.105 master
(3).ssh function is good,the four nodes in the cluster can load each other without passcodes
(4).I'm sure the fire wall is off
# /etc/init.d/iptables stop
# ufw status
Status: inactive
(5).I also tried to open the 8020 port manually:
# iptables -A INPUT -p tcp --dport 8020 -j ACCEPT
it still didn't work.....
(6).I tried :
# telnet master 8020
but:
telnet:Unable to connect to remote host: Connect refused
(7).I find that not only the 8020 port can't work,but also 50070 can't work
(8).I'm sure the parameters in the configure files is 8020:
fs.defaultFS
hdfs://master:8020
dfs.namenode.rpc-address
master:8020
I'm expecting somebody can help me,I will be very appreciate about thant.
Thank you!

Ambari Server fatal error

WARN [qtp-ambari-agent-66] nio:720 - javax.net.ssl.SSLException: Received fatal alert: unknown_ca
I am using ambar-server version 2.1.0 with jdk1.8. At the time of registering nodes this error start.
At the same time agent is showing error ...
Server at https://ip-xxx-xx-xx-Xx.internal:8440 is not reachable, sleeping for 10 seconds...
Run this command on each node
sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg

Resources