Springboot application unable to recover after jms connection failure - spring-boot

We have a sprinboot application which stops retrying to connect with solace queues after 3 connection attempts. We get below information logged and then application just does not respond and we have to restart the application:
2021-09-15 16:49:08.021 INFO 4444 --- [recovery-thread] bitronix.tm.recovery.Recoverer : recoverer is already running, abandoning this recovery request
2021-09-15 16:50:04.862 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connection attempt failed to host '<<hostname>>' ReconnectException com.solacesystems.jcsmp.JCSMPSecurityException: Error performing login to LoginContext (*****) cause: javax.security.auth.login.LoginException: *****
2021-09-15 16:50:07.865 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connecting to host 'orig=tcp://<<hostname>>:55555, scheme=tcp://, host=<<hostname>>, port=55555' (host 1 of 1, smfclient 2, attempt 3 of 3, this_host_attempt: 1 of 1)
2021-09-15 16:50:07.877 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connection attempt failed to host '<<hostname>>' ReconnectException com.solacesystems.jcsmp.JCSMPSecurityException: Error performing login to LoginContext (*****) cause: javax.security.auth.login.LoginException: *****
2021-09-15 16:50:10.878 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Stale reconnect task, aborting reconnect.
Below is our configuration for connecting to solace queues:
spring.jta.bitronix.connectionfactory.className=com.solacesystems.jms.SolXAConnectionFactoryImpl
spring.jta.bitronix.connectionfactory.driverProperties.host=smf://<<hostname>>:55555
spring.jta.bitronix.connectionfactory.driverProperties.VPN=<<vpn>>
spring.jta.bitronix.connectionfactory.driverProperties.authenticationScheme=AUTHENTICATION_SCHEME_GSS_KRB
spring.jta.bitronix.connectionfactory.driverProperties.KRBServiceName=HOST
In our service class we are just autowiring the object of jmsTemplate and publishing messages on the queue.
I went through few documentations and tried adding below configuration:
spring.jta.bitronix.connectionfactory.ignore-recovery-failures=true
But still I am facing the same issue. Any suggestions
====Edit
I face this issue only when I put my laptop in airplane mode and reconnect. If I just disconnect from VPN and connect back solace connection is getting reestablished

The SolXAConnectionFactory interface allows for you to tune the connect and reconnect parameters. Docs here.
You'll want to checkout these and maybe a few others. I suggest searching the javadoc for "retry" and "retries":
connectRetries
connectRetriesPerHost
connectTimeoutInMillies
reconnectRetries

I did more research and found the following helpful, would try it in my application : https://solace.community/discussion/917/why-won-t-my-solace-enterprise-application-reconnect-after-an-ha-failover To set it at JNDI, I think this should also be configured at SolAdmin -> JMS Administration -> connection factory -> Transport Properties.

After going through the various documentations and doing some hit and trials, below properties turn out too be useful. Hope it can help somebody:
spring.jta.bitronix.connectionfactory.driverProperties.reconnectRetries = -1
spring.jta.bitronix.connectionfactory.driverProperties.connectRetries = -1

Related

Connection Refused for Consul

I am starting a Spring Boot Application with Consul.
I am getting the following error
2019-08-30 12:34:22.650 ERROR 23428 --- [ main] o.s.boot.SpringApplication : Application run failed
com.ecwid.consul.transport.TransportException: org.apache.http.conn.HttpHostConnectException: Connect to localhost:8090 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused: connect`
I changed the default port in bootstrap.properties file.
I also used another non-Consul Spring Boot Application and it worked fine for that use-case with same port.
8090 is not the default port for Consul. You didn't say if your other successful app was on the same host or not, but make sure Consul is actually listening on that port with netstat or ss.
By default, Consul listens for API requests on port 8500.

Apache NiFi closes immediately with error, org.apache.nifi.StdErr Failed to start web server

I have recently installed NiFi, and it was working fine for few days. But suddenly today when i try to open it using run-nifi.bat, the NiFi window is getting closed in few seconds stating the below error:
2019-04-11 23:07:40,146 WARN [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Failed to set permissions so that only the owner can read status file C:\Users\DOWNLO~1\NIFI-1~1.1-B\NIFI-1~1.1\bin\..\run\nifi.status; this may allows others to have access to the key needed to communicate with NiFi. Permissions should be changed so that only the owner can read this file
2019-04-11 23:07:40,149 INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Bootstrap requests on port 54149
2019-04-11 23:08:00,352 ERROR [NiFi logging handler] org.apache.nifi.StdErr Failed to start web server: Must configure HTTP or HTTPS connector
2019-04-11 23:08:00,352 ERROR [NiFi logging handler] org.apache.nifi.StdErr Shutting down...
2019-04-11 23:08:00,419 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi never started. Will not restart NiFi
I do looked out for the org.apache.nifi.StdErr Failed to start web server: Must configure HTTP or HTTPS connector error, but unfortunately I cant find a similar one. I'm sure that no settings or properties has been changed since installation. Any suggestion guys?
I was getting the same error.
You need to check nifi-app.log file to get more details on this type of error.
Here’s what I did: Remove the port information from nifi.properties for HTTPS and only keep the setting for HTTP. Then restart nifi again.
Keep one property enable like https or http. in my case https:127.0.0.1:8443/nifi/
it works fine me.

JMS ActiveMQ SpringBoot .FailoverTransport

iam trying to connected to remote broker url in activeMQ (activemq installed in unix vm)
iam able to connect from browser from my laptop.
while running springboot iam getting this error
--- [ActiveMQ Task-1] o.a.a.t.failover.FailoverTransport : Failed to connect to [tcp://http://199.247.18.11:61616] after: 8 attempt(s) continuing to retry.
what could be the issue?
Please remove https:// from your connection string. Port 61616 is expecting JMS connections.
Your connection string should be tcp://199.247.18.11:61616 or something similar. There is a rest API that (I think) goes through the built in HTTP server but it's not going to listen on 61616 and it's going to have a much longer URL. Something like
http://admin:admin#localhost:8161/api/message?destination=queue://myqueue
still issue
yml file
activemq:
broker-url: failover:(tcp://http://199.247.18.11:61616)?initialReconnectDelay=1000&maxReconnectDelay=60000&warnAfterReconnectAttempts=2
error:
2018-05-01 07:41:51.312 WARN 6560 --- [ActiveMQ Task-1] o.a.a.t.failover.FailoverTransport : Failed to connect to [tcp://http://199.247.18.11:61616] after: 2 attempt(s) continuing to retry.

Storm - Supervisors launched but not connecting to Nimbus

I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.

Squirrel Client Connecting to Phoenix - Timeout Exception

I am trying to connect to Phoenix via Squirrel client. I am receiving the following logs in the Squirrel logs. The logs suggests that the ClientConnection to zooperkeeper is established however it fails when a SQLClient Connection is being established with a Timeout exception.
I have copied the phoenix client jar into the lib directory of Squirrel and the driver is registered succesfully. Also when I run the SQLLine.py utility in the localhost it loads the SQL commandline to Phoenix succesfully and I can run the commands. Added the phoenix core jars to the $HBASE_HOME/lib folders as well.
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x776a1002 connecting to ZooKeeper ensemble=10.58.126.245:2181
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.58.126.245:2181 sessionTimeout=90000 watcher=hconnection-0x776a10020x0, quorum=10.58.126.245:2181, baseZNode=/hbase
2015-06-15 12:48:58,287 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.58.126.245/10.58.126.245:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-15 12:48:58,301 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.58.126.245/10.58.126.245:2181, initiating session
2015-06-15 12:48:58,314 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.58.126.245/10.58.126.245:2181, sessionid = 0x14df5b87b120040, negotiated timeout = 90000
2015-06-15 12:49:58,100 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=10, retries=35, started=59774 ms ago, cancelled=false, msg=
2015-06-15 12:50:20,456 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=11, retries=35, started=82130 ms ago, cancelled=false, msg=
2015-06-15 12:50:36,114 [AWT-EventQueue-1] ERROR net.sourceforge.squirrel_sql.client.gui.db.ConnectToAliasCallBack - Unexpected Error occurred attempting to open an SQL connection.
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I have the same problem, don't find the solution yet, but I managed to use the "thin" client instead.
Start queryserver https://phoenix.apache.org/server.html should listen on port 8765
Copy JAR phoenix-4.6.0-HBase-1.1-thin-client to Squirel lib folder
Create new driver, the class name is "org.apache.phoenix.queryserver.client.Driver"
Connect with this driver (my URI: jdbc:phoenix:thin:url=http://docker:8765)

Resources