What causes Cometd to timeout a session so soon? - websocket

I'm having trouble with a websocket client connecting to our Cometd server. I can see it is connecting ok and the handshake is good. But after 160 ms, Cometd thinks the session is timed out and removes it.
05:45:45.597 [00003] INFO - canHandshake() result true
05:45:45.597 [00003] INFO - Websocket session-added : 86db64o1yu7v5elbwdkgg595e
05:45:45.597 [00003] INFO - Registering websocket session : session {id=86db64o1yu7v5elbwdkgg595e,cid=0,appId=GMCN01,rid=90C3301D-0295-633
05:45:45.598 [00003] INFO - Registering websocket session : 86db64o1yu7v5elbwdkgg595e for registrationId 90C3301D-0295-6336-351C-45C8884DD
05:45:45.598 [00003] INFO - storing session info for client 0 # with host http://10.200.1.87:8081/websocket/transmit?sId=86db64o1yu7v5elbw
05:45:45.605 [00003] INFO - << {minimumVersion=1.0, supportedConnectionTypes=[websocket, callback-polling, long-polling], successful=true,
05:45:45.606 [00003] INFO - < {minimumVersion=1.0, supportedConnectionTypes=[websocket, callback-polling, long-polling], successful=true,
05:45:46.135 [00004] INFO - > {connectionType=websocket, channel=/meta/connect, clientId=86db64o1yu7v5elbwdkgg595e} 86db64o1yu7v5elbwdkgg
05:45:46.136 [00004] INFO - >> {connectionType=websocket, channel=/meta/connect, clientId=86db64o1yu7v5elbwdkgg595e}
05:45:46.136 [00004] INFO - << {successful=true, advice={interval=0, reconnect=retry, timeout=30000}, channel=/meta/connect}
05:45:46.136 [00004] INFO - < {successful=true, advice={interval=0, reconnect=retry, timeout=30000}, channel=/meta/connect}
05:45:46.296 [00007] INFO - Removing session 86db64o1yu7v5elbwdkgg595e - last connect 160 ms ago, timed out: true <--- THIS IS VERY ODD
05:45:46.296 [00007] INFO - Websocket session-removed : (t/o=true) 86db64o1yu7v5elbwdkgg595e
My own test client appears to work ok, but perhaps because I am closer to the servers and the delay is not as much. The client that is failing is in another region. But the logs do not indicate any latency. The 160 ms timeout seems way too small.
I'm using Java Cometd 2.6.0 embedded with jetty 8.1.12.
I'm thinking there is a setting for the timeout that is too small, but not sure which one controls this, or if there are other reasons behind the timeout.
Anyone else seen this or can explain why this is happening?

Embarrassingly, I found the ws.maxInterval was set to 25 instead of 25000. That was the issue.

Related

Springboot application unable to recover after jms connection failure

We have a sprinboot application which stops retrying to connect with solace queues after 3 connection attempts. We get below information logged and then application just does not respond and we have to restart the application:
2021-09-15 16:49:08.021 INFO 4444 --- [recovery-thread] bitronix.tm.recovery.Recoverer : recoverer is already running, abandoning this recovery request
2021-09-15 16:50:04.862 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connection attempt failed to host '<<hostname>>' ReconnectException com.solacesystems.jcsmp.JCSMPSecurityException: Error performing login to LoginContext (*****) cause: javax.security.auth.login.LoginException: *****
2021-09-15 16:50:07.865 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connecting to host 'orig=tcp://<<hostname>>:55555, scheme=tcp://, host=<<hostname>>, port=55555' (host 1 of 1, smfclient 2, attempt 3 of 3, this_host_attempt: 1 of 1)
2021-09-15 16:50:07.877 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Connection attempt failed to host '<<hostname>>' ReconnectException com.solacesystems.jcsmp.JCSMPSecurityException: Error performing login to LoginContext (*****) cause: javax.security.auth.login.LoginException: *****
2021-09-15 16:50:10.878 INFO 4444 --- [connect_service] c.s.j.protocol.impl.TcpClientChannel : Stale reconnect task, aborting reconnect.
Below is our configuration for connecting to solace queues:
spring.jta.bitronix.connectionfactory.className=com.solacesystems.jms.SolXAConnectionFactoryImpl
spring.jta.bitronix.connectionfactory.driverProperties.host=smf://<<hostname>>:55555
spring.jta.bitronix.connectionfactory.driverProperties.VPN=<<vpn>>
spring.jta.bitronix.connectionfactory.driverProperties.authenticationScheme=AUTHENTICATION_SCHEME_GSS_KRB
spring.jta.bitronix.connectionfactory.driverProperties.KRBServiceName=HOST
In our service class we are just autowiring the object of jmsTemplate and publishing messages on the queue.
I went through few documentations and tried adding below configuration:
spring.jta.bitronix.connectionfactory.ignore-recovery-failures=true
But still I am facing the same issue. Any suggestions
====Edit
I face this issue only when I put my laptop in airplane mode and reconnect. If I just disconnect from VPN and connect back solace connection is getting reestablished
The SolXAConnectionFactory interface allows for you to tune the connect and reconnect parameters. Docs here.
You'll want to checkout these and maybe a few others. I suggest searching the javadoc for "retry" and "retries":
connectRetries
connectRetriesPerHost
connectTimeoutInMillies
reconnectRetries
I did more research and found the following helpful, would try it in my application : https://solace.community/discussion/917/why-won-t-my-solace-enterprise-application-reconnect-after-an-ha-failover To set it at JNDI, I think this should also be configured at SolAdmin -> JMS Administration -> connection factory -> Transport Properties.
After going through the various documentations and doing some hit and trials, below properties turn out too be useful. Hope it can help somebody:
spring.jta.bitronix.connectionfactory.driverProperties.reconnectRetries = -1
spring.jta.bitronix.connectionfactory.driverProperties.connectRetries = -1

JMS ActiveMQ SpringBoot .FailoverTransport

iam trying to connected to remote broker url in activeMQ (activemq installed in unix vm)
iam able to connect from browser from my laptop.
while running springboot iam getting this error
--- [ActiveMQ Task-1] o.a.a.t.failover.FailoverTransport : Failed to connect to [tcp://http://199.247.18.11:61616] after: 8 attempt(s) continuing to retry.
what could be the issue?
Please remove https:// from your connection string. Port 61616 is expecting JMS connections.
Your connection string should be tcp://199.247.18.11:61616 or something similar. There is a rest API that (I think) goes through the built in HTTP server but it's not going to listen on 61616 and it's going to have a much longer URL. Something like
http://admin:admin#localhost:8161/api/message?destination=queue://myqueue
still issue
yml file
activemq:
broker-url: failover:(tcp://http://199.247.18.11:61616)?initialReconnectDelay=1000&maxReconnectDelay=60000&warnAfterReconnectAttempts=2
error:
2018-05-01 07:41:51.312 WARN 6560 --- [ActiveMQ Task-1] o.a.a.t.failover.FailoverTransport : Failed to connect to [tcp://http://199.247.18.11:61616] after: 2 attempt(s) continuing to retry.

AMQP 1.0 Qpid JMS and an Issue with Failover/Reconnect

Im using Qpid JMS 0.8.0 library in order to implement a standalone java AMQP client. Because the underlying transport connection tends to break every couple of hours I have set the reconnection using following configuration:
failover:(amqps://someurl:5671)?failover.reconnectDelay=2000&failover.warnAfterReconnectAttempts=1
In accordance with Qpid client configuration explanation page I expect my client to keep trying to reconnect increasing the attempt delays for factor 2 (starting with 2 seconds). Instead, according to the log file, only two attempts to reconnect have been performed when a connection failure was detected and at the end the whole client application has been terminated, what I definitively would like to avoid! Here is the log file:
2016-03-22 14:29:40 INFO AmqpProvider:1190 - IdleTimeoutCheck closed the transport due to the peer exceeding our requested idle-timeout.
2016-03-22 14:29:40 DEBUG FailoverProvider:761 - Failover: the provider reports failure: Transport closed due to the peer exceeding our requested idle-timeout
2016-03-22 14:29:40 DEBUG FailoverProvider:519 - handling Provider failure: Transport closed due to the peer exceeding our requested idle-timeout
2016-03-22 14:29:40 DEBUG FailoverProvider:653 - Connection attempt:[1] to: amqps://publish.preops.nm.eurocontrol.int:5671 in-progress
2016-03-22 14:29:40 INFO FailoverProvider:659 - Connection attempt:[1] to: amqps://publish.preops.nm.eurocontrol.int:5671 failed
2016-03-22 14:29:40 WARN FailoverProvider:686 - Failed to connect after: 1 attempt(s) continuing to retry.
2016-03-22 14:29:42 DEBUG FailoverProvider:653 - Connection attempt:[2] to: amqps://publish.preops.nm.eurocontrol.int:5671 in-progress
2016-03-22 14:29:42 INFO FailoverProvider:659 - Connection attempt:[2] to: amqps://publish.preops.nm.eurocontrol.int:5671 failed
2016-03-22 14:29:42 WARN FailoverProvider:686 - Failed to connect after: 2 attempt(s) continuing to retry.
2016-03-22 14:29:43 DEBUG ThreadPoolUtils:156 - Shutdown of ExecutorService: java.util.concurrent.ThreadPoolExecutor#778970af[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] is shutdown: true and terminated: true took: 0.000 seconds.
2016-03-22 14:29:45 DEBUG ThreadPoolUtils:192 - Waited 2.004 seconds for ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor#877a470[Shutting down, pool size = 1, active threads = 0, queued tasks = 1, completed tasks = 3] to terminate...
2016-03-22 14:29:46 DEBUG ThreadPoolUtils:156 - Shutdown of ExecutorService: java.util.concurrent.ScheduledThreadPoolExecutor#877a470[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 4] is shutdown: true and terminated: true took: 2.889 seconds.
Any idea, what I’m doing wrong here? Basically, what I'm looking for to achieve is a client which is capable to detect transport connection failure and try to reconnect every 5-10 seconds.
Many thanks!

Squirrel Client Connecting to Phoenix - Timeout Exception

I am trying to connect to Phoenix via Squirrel client. I am receiving the following logs in the Squirrel logs. The logs suggests that the ClientConnection to zooperkeeper is established however it fails when a SQLClient Connection is being established with a Timeout exception.
I have copied the phoenix client jar into the lib directory of Squirrel and the driver is registered succesfully. Also when I run the SQLLine.py utility in the localhost it loads the SQL commandline to Phoenix succesfully and I can run the commands. Added the phoenix core jars to the $HBASE_HOME/lib folders as well.
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x776a1002 connecting to ZooKeeper ensemble=10.58.126.245:2181
2015-06-15 12:48:53,766 [pool-7-thread-1] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.58.126.245:2181 sessionTimeout=90000 watcher=hconnection-0x776a10020x0, quorum=10.58.126.245:2181, baseZNode=/hbase
2015-06-15 12:48:58,287 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.58.126.245/10.58.126.245:2181. Will not attempt to authenticate using SASL (unknown error)
2015-06-15 12:48:58,301 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.58.126.245/10.58.126.245:2181, initiating session
2015-06-15 12:48:58,314 [pool-7-thread-1-SendThread(10.58.126.245:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.58.126.245/10.58.126.245:2181, sessionid = 0x14df5b87b120040, negotiated timeout = 90000
2015-06-15 12:49:58,100 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=10, retries=35, started=59774 ms ago, cancelled=false, msg=
2015-06-15 12:50:20,456 [pool-7-thread-1] INFO org.apache.hadoop.hbase.client.RpcRetryingCaller - Call exception, tries=11, retries=35, started=82130 ms ago, cancelled=false, msg=
2015-06-15 12:50:36,114 [AWT-EventQueue-1] ERROR net.sourceforge.squirrel_sql.client.gui.db.ConnectToAliasCallBack - Unexpected Error occurred attempting to open an SQL connection.
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.awaitConnection(OpenConnectionCommand.java:132)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand.access$100(OpenConnectionCommand.java:45)
at net.sourceforge.squirrel_sql.client.mainframe.action.OpenConnectionCommand$2.run(OpenConnectionCommand.java:115)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I have the same problem, don't find the solution yet, but I managed to use the "thin" client instead.
Start queryserver https://phoenix.apache.org/server.html should listen on port 8765
Copy JAR phoenix-4.6.0-HBase-1.1-thin-client to Squirel lib folder
Create new driver, the class name is "org.apache.phoenix.queryserver.client.Driver"
Connect with this driver (my URI: jdbc:phoenix:thin:url=http://docker:8765)

Storm worker not starting

I am trying to storm a storm topology but the storm worker refuses to start when I try to run the java command which invokes the worker process I get the following error:
Exception: java.lang.StackOverflowError thrown from the UncaughtExceptionHandler in thread "main"
I am not able to find what problem is causing this. Has anyone faced similar issue
Edit:
when I runt the worker process with flag -V I get the following error:
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.io.tmpdir=/tmp
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:java.compiler=<NA>
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.name=Linux
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.arch=amd64
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:os.version=3.5.0-23-generic
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.name=storm
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.home=/home/storm
588 [main] INFO org.apache.zookeeper.server.ZooKeeperServer - Server environment:user.dir=/home/storm/storm-0.9.0.1
797 [main] ERROR org.apache.zookeeper.server.NIOServerCnxn - Thread Thread[main,5,main] died
PS: When I run the same topology in local cluster it works fine, only when i deploy in cluster mode it doesnt start.
Just found out the issue. The jar I creted to upload in the storm cluster, was kept in the storm base directory pics. This somehow was creating conflict which was not shown in the log file and actually log file never got created.
Make sure no external jars are present in the base storm folder from where one start storm. Really tricky error no idea why this happens until you just get around it.
Hope the storm guys add this into the logs so that user facing such issue can pinpoint why exactly this is happening.

Resources