Connection refused error in worker logs - apache storm - apache-storm

I see the below error in worker logs, it happens almost every milliseconds, but the cluster is running fine, I wanted to know what does these error mean and any idea on why this would occur.
This happens on all the worker nodes
2016-05-12T15:32:53.514-0500 b.s.m.n.Client [ERROR] connection attempt 3 to Netty-Client-xxxxx.hq.abc.com/xx.xx.xxx.xx:6700 failed: java.net.ConnectException: Connection refused: xxxxx.hq.abc.com/xx.xx.xxx.xxx:6700
And after some time i see this
2016-05-12T15:44:25.940-0500 b.s.m.n.Client [ERROR] discarding 1 messages because the Netty client to Netty-Client-xxxxx.hq.abc.com/xx.xx.xxx.xxx:6700 is being closed

After struggling forever with this problem, I found that setting storm.local.hostname property in the storm.yaml file solved the issue for me. On my laptop, I set storm.zookeeper.servers, nimbus.host and storm.local.hostname all to "localhost".
I am using version 0.10.2 of storm.

Related

Expected hostname at index 7 for neo4j bolt (3.5.21)

We run neo4j (3.5.21) in an EC2 instance. Today, after I restarted the server, noticed this error:
Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start
Service start logs:
Active database: graph.db
Directories in use:
home: /var/lib/neo4j
config: /etc/neo4j
logs: /var/log/neo4j
plugins: /var/lib/neo4j/plugins
import: /var/lib/neo4j/import
data: /var/lib/neo4j/data
certificates: /var/lib/neo4j/certificates
run: /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 22577). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j/neo4j.log for current status.
This is what I see in neo4j.log:
2022-12-03 20:29:49.886+0000 INFO Bolt enabled on 0.0.0.0:7687.
2022-12-03 20:29:51.968+0000 INFO Started.
2022-12-03 20:29:52.121+0000 INFO Stopping...
2022-12-03 20:29:52.231+0000 INFO Stopped.
2022-12-03 20:29:52.233+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687". Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:187)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:124)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:91)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter#75401424' was successfully initialized, but failed to start. Please see the attached cause exception "Expected hostname at index 7: bolt://:7687".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:180)
... 3 more
Caused by: org.neo4j.graphdb.config.InvalidSettingException: Unable to construct bolt discoverable URI using '' as hostname: Expected hostname at index 7: bolt://:7687
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:133)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.lambda$addBoltConnectorFromConfig$1(DiscoverableURIs.java:155)
at java.util.Optional.ifPresent(Optional.java:159)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.addBoltConnectorFromConfig(DiscoverableURIs.java:145)
at org.neo4j.server.rest.discovery.CommunityDiscoverableURIs.communityDiscoverableURIs(CommunityDiscoverableURIs.java:38)
at org.neo4j.server.CommunityNeoServer.lambda$createDBMSModule$0(CommunityNeoServer.java:99)
at org.neo4j.server.modules.DBMSModule.start(DBMSModule.java:59)
at org.neo4j.server.AbstractNeoServer.startModules(AbstractNeoServer.java:249)
at org.neo4j.server.AbstractNeoServer.access$700(AbstractNeoServer.java:102)
at org.neo4j.server.AbstractNeoServer$ServerComponentsLifecycleAdapter.start(AbstractNeoServer.java:541)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: java.net.URISyntaxException: Expected hostname at index 7: bolt://:7687
at java.net.URI$Parser.fail(URI.java:2847)
at java.net.URI$Parser.failExpecting(URI.java:2853)
at java.net.URI$Parser.parseHostname(URI.java:3389)
at java.net.URI$Parser.parseServer(URI.java:3235)
at java.net.URI$Parser.parseAuthority(URI.java:3154)
at java.net.URI$Parser.parseHierarchical(URI.java:3096)
at java.net.URI$Parser.parse(URI.java:3052)
at java.net.URI.<init>(URI.java:673)
at org.neo4j.server.rest.discovery.DiscoverableURIs$Builder.add(DiscoverableURIs.java:128)
... 15 more
2022-12-03 20:29:52.243+0000 INFO Neo4j Server shutdown initiated by request
EC2: t3.large
OS: Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1092-aws x86_64)
I have already tried restarting the server, restarting the service multiple times without any success. We have not changed anything on the networking (vpc, subnet, security groups, network interface, etc)
Curious if there's a config I am missing. Any help will be much appreciated.

java.net.ConnectException error when running yarn

I'm having an error when running yarn on a job. HDFS and Yarn both start up fine, jps shows everything normal, pseudo-distributed mode on HDFS works perfectly, and I have triple and quadruple checked my configuration files. Whenever I attempt to run Yarn, however, this happens:
INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From serverA/IPaddress to serverB:30170 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 6 failover attempts. Trying to failover after sleeping for 44428ms.
Yarn then attempts to connect over and over again until I forcefully quit the process. Any ideas why this is happening?
Can you see yarn web ui?
How did you start hdfs and yarn?
You can try ./sbin/start-all.sh

Storm - Supervisors launched but not connecting to Nimbus

I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.

There are something wrong about Hadoop cluster

I have build a hadoop cluster on ECS on Aliyun of Alibaba.com( it's like AWS). The OS is Ubuntu12.04 . The version of Hadoop is 2.7.1
The cluster is consisted of one master and two slaves.
I can start it successfully. Every node can work well, and I
can use ssh to access two slave node from master node.
Every node is started.
But when I run the wordcount program, there is something wrong. The
error is as following:
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:38635 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
When I added Port 38635 in the file /etc/ssh/sshd_config, I run the wordcount program again. The error is still existed, the only difference is the Port 38635 changed.
exception: java.net.ConnectException: Call From master/10.144.52.189 to localhost:46656 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
How to fix this problem? The ports 38635 and 46656 are added in /etc/ssh/sshd_config, the error occurs when run the wordcount program with a new port in the error information.

DB connection issue with Playframework 2.1 and Bonecp 0.8.0 : This connection has been closed

I was facing an issue with Bonecp 0.7.1 on a Playframework app using postgresql 9.2.4 on Heroku. It seems this version had a DB connection leak causing after several DB accesses the folllwing error :
[error] c.j.b.h.AbstractConnectionHook - Failed to acquire connection Sleeping for 1000ms and trying again. Attempts left: 1. Exception: null.Message:FATAL: too many connections for role "eonqhnjenuislk" Database warning
[error] c.j.b.PoolWatchThread - Error in trying to obtain a connection. Retrying in 1000ms
org.postgresql.util.PSQLException: FATAL: too many connections for role "eonqhnjenuislk"
at org.postgresql.core.v3.ConnectionFactoryImpl.readStartupMessages(ConnectionFactoryImpl.java:469) ~[postgresql-9.1-901.jdbc4.jar:na]
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:112) ~[postgresql-9.1-901.jdbc4.jar:na]
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66) ~[postgresql-9.1-901.jdbc4.jar:na]
at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125) ~[postgresql-9.1-901.jdbc4.jar:na]
at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30) ~[postgresql-9.1-901.jdbc4.jar:na]
at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22) ~[postgresql-9.1-901.jdbc4.jar:na]
As every threads of the connection pool was acquired and retained, the application was not reachable anymore until I restarted it.
Then I heard that this issue was corrected in Bonecp 0.8.0, so I upgraded the lib. But the issue seems not to be completely fixed. In fact, now connection threads are not retained anymore what make the application reachable at anytime but sometimes a DB connection close suddenly... The app throws the following error causing an 500 error to the end users :
javax.persistence.PersistenceException: org.postgresql.util.PSQLException: This connection has been closed.
at com.avaje.ebeaninternal.server.transaction.TransactionManager.createTransaction(TransactionManager.java:331)
at com.avaje.ebeaninternal.server.core.DefaultServer.createServerTransaction(DefaultServer.java:2056)
at com.avaje.ebeaninternal.server.core.BeanRequest.createImplicitTransIfRequired(BeanRequest.java:58)
at com.avaje.ebeaninternal.server.core.PersistRequest.initTransIfRequired(PersistRequest.java:81)
at com.avaje.ebeaninternal.server.persist.DefaultPersister.executeSqlUpdate(DefaultPersister.java:146)
at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:1928)
at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:1935)
at com.avaje.ebeaninternal.server.core.DefaultSqlUpdate.execute(DefaultSqlUpdate.java:148)
at actor.PublicParkingPlacesActor$1.apply(PublicParkingPlacesActor.java:41)
at actor.PublicParkingPlacesActor$1.apply(PublicParkingPlacesActor.java:26)
at play.libs.F$Promise$PromiseActor.onReceive(F.java:425)
at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:159)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:425)
at akka.actor.ActorCell.invoke(ActorCell.scala:386)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:230)
at akka.dispatch.Mailbox.run(Mailbox.scala:212)
at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:502)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:262)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1478)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:714)
at org.postgresql.jdbc2.AbstractJdbc2Connection.setAutoCommit(AbstractJdbc2Connection.java:661)
at com.jolbox.bonecp.ConnectionHandle.setAutoCommit(ConnectionHandle.java:1292)
at play.api.db.BoneCPApi$$anon$1.onCheckOut(DB.scala:328)
at com.jolbox.bonecp.AbstractConnectionStrategy.postConnection(AbstractConnectionStrategy.java:75)
at com.jolbox.bonecp.AbstractConnectionStrategy.getConnection(AbstractConnectionStrategy.java:92)
at com.jolbox.bonecp.BoneCP.getConnection(BoneCP.java:553)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:131)
at play.db.ebean.EbeanPlugin$WrappingDatasource.getConnection(EbeanPlugin.java:146)
at com.avaje.ebeaninternal.server.transaction.TransactionManager.createTransaction(TransactionManager.java:297)
... 20 more
Thanks a lot for your help!
EDIT :
DB configuration :
db.default.isolation=READ_COMMITTED
db.default.partitionCount=2
db.default.maxConnectionsPerPartition=10
db.default.minConnectionsPerPartition=5
db.default.acquireIncrement=1
db.default.acquireRetryAttempts=2
db.default.acquireRetryDelay=5 seconds
db.default.connectionTimeout=10 second
db.default.idleMaxAge=10 minute
db.default.idleConnectionTestPeriod=5 minutes
db.default.initSQL="SELECT 1"
db.default.maxConnectionAge=1 hour
EDIT 2:
Here is the DB config I set according to this post Heroku/Play/BoneCp connection issues
These changes reduce the number of "This connection has been closed" issues, but I still get 1 or 2 of them perdays what makes some HTTP requests to fail. So the issue is still not fixed:
db.default.isolation=READ_COMMITTED
db.default.partitionCount=2
db.default.maxConnectionsPerPartition=10
db.default.minConnectionsPerPartition=5
db.default.acquireIncrement=1
db.default.acquireRetryAttempts=2
db.default.acquireRetryDelay=5 seconds
db.default.connectionTimeout=10 seconds
db.default.idleMaxAge=10 minutes
db.default.idleConnectionTestPeriod=30 seconds
db.default.initSQL="SELECT 1"
db.default.maxConnectionAge=30 minutes

Resources