How to fix "sequel::DatabaseDisconnectError - Mysql::Error: MySQL server has gone away" on Heroku - ruby

I have a simple Sinatra application hosted on Heroku and using Sequel to connect to a MySql database through the ClearDB addon.
The application works fine, except when it sits idle for more than a minute. In that case, the first request I make gives a "500 Internal Server Error", which heroku logs reveals to be:
sequel::DatabaseDisconnectError - Mysql::Error: MySQL server has gone away
If I refresh the page after this error, it works fine, and the error will not return until the application sits idle for another minute or so.
The application is running two dynos, so the problem is not being caused by the Heroku dyno idling you might see on a free account. I contacted ClearDB support, and they gave me this advice:
if you are using connection pooling, then you should set the idle
timeout at just below 60 seconds and/or set a keep-alive as I
mentioned below. If you are not using connection pooling, then you
must make sure that the app actually closes connections after queries
and doesn't rely on the network timeout to shut them down.
I understand that I could create a cron job to hit the server every 30s or so, but that seems an inelegant solution to the problem. The other suggestion about making sure the application closes connections I don't understand. I'm just using Sequel to make queries, I assumed that Sequel manages the connections for me under the hood. Do I need to configure it to ensure that it closes connections? How would I do that?

Your connection times out which is no big deal. Sequel can deal with that situation if you add the connection_validator extension to your DB:
DB.extension(:connection_validator)
As described in the documentation this extension
"detects an invalid connection, […] removes it from the pool and tries the next available connection, creating a new connection if no available connection is valid"

Related

how to terminate inactive websocket connection in passenger

Past few days we are struggling with inactive websocket connections. The problem may lay on network level. I would like to ask if there is any switch/configuration option to set timeout for websocket connection for Pushion Passenger in standalone mode.
You should probably solve this at the application level, because solving it in other layers will be more ugly (less knowledge about websocket).
With Passenger Standalone you could try to set max_requests. This should cause application processes to be restarted semi-regularly, and when shutting down a process Passenger should normally abort websocket connections.
If you want more control over the restart period you could also use for example a cron job that executes rolling restarts every so often, which shouldn't be noticeable to users either.
Websockets in Ruby and Passenger (and maybe node.js as well) aren't "native" to the server. Instead, your application "hijacks" the socket and controls all the details (timeouts, parsing etc').
This means that a solution must be implemented in the application layer (or whatever framework you're using), since Passenger doesn't keep any information about the socket any more.
I know this isn't the answer you wanted, but it's the fact of the matter.
Some approaches use native websockets where the server controls websocket connections (timeouts, parsing etc', i.e. the Ruby MRI iodine server), but mostly websockets are "hijacked" from the server and the application takes full control and ownership of the connections.

JDBC pooling related to ntp sync?

We're having a connection timeout issue from an API pooling connections to an informix connection manager which forwards the queries to the appropriate informix database server.
Recently, I've set up the mail service and realized that we're having delays in receiving the mail send and after troubleshooting I saw that the database server is not syncronized at all with the API ( 2+ minutes difference ).
I've read somewhere that time sync is important when using jdbc pooling but I can't find to much information regarding this on internet. The timeout kinda makes sense because of the tcp keepalive.
Had anyone experienced or know about this ?
Thank you,
Mihai.
It is common to intermix database timestamps and local timestamps. This causes issues when the server times are different. If the mail server is looking for records before the current time, there could be a two minute delay before mail is sent.
Email may be delayed in transit between servers. Check the Received headers to see if there are any unexpected delays. (You will need to compensate for time variances on the servers.
Normally, you would use NTP to ensure the time is the same on all servers. Within a data center it should be able to synchronize times to a millisecond or so.

nodeJS being bombarded with reconnections after restart

We have a node instance that has about 2500 client socket connections, everything runs fine except occasionally then something happens to the service (restart or failover event in azure), when the node instances comes back up and all socket connections try to reconnect the service comes to a halt and the log just shows repeated socket connect/disconnects. Even if we stop the service and start it the same thing happens, we currently send out a package to our on premise servers to kill the users chrome sessions then everything works fine as users begin logging in again. We have the clients currently connecting with 'forceNew' and force web sockets only and not the default long polling than upgrade. Any one ever see this or have ideas?
In your socket.io client code, you can force the reconnects to be spread out in time more. The two configuration variables that appear to be most relevant here are:
reconnectionDelay
Determines how long socket.io will initially wait before attempting a reconnect (it should back off from there if the server is down awhile). You can increase this to make it less likely they are all trying to reconnect at the same time.
randomizationFactor
This is a number between 0 and 1.0 and defaults to 0.5. It determines how much the above delay is randomly modified to try to make client reconnects be more random and not all at the same time. You can increase this value to increase the randomness of the reconnect timing.
See client doc here for more details.
You may also want to explore your server configuration to see if it is as scalable as possible with moderate numbers of incoming socket requests. While nobody expects a server to be able to handle 2500 simultaneous connections all at once, the server should be able to queue up these connection requests and serve them as it gets time without immediately failing any incoming connection that can't immediately be handled. There is a desirable middle ground of some number of connections held in a queue (usually controllable by server-side TCP configuration parameters) and then when the queue gets too large connections are failed immediately and then socket.io should back-off and try again a little later. Adjusting the above variables will tell it to wait longer before retrying.
Also, I'm curious why you are using forceNew. That does not seem like it would help you. Forcing webSockets only (no initial polling) is a good thing.

Amazon Load Balancers dropping Web Socket connections to TorqueBox

I'm running TorqueBox on Amazon AWS. I've created a load balancer, which does TCP pass through for Web Socket connections on port 8675. When I first load up the page this seems to work quite nicely, however if I leave the page open for a while, the connection just stops working. I don't get an error message, it just silently ignores any further messages sent over the connection. If I reload the page at this point, everything works fine again.
I've tried connecting to individual nodes in the cluster directly, and the connection does not get dropped in that case, so my suspicion is that it has something to do with the load balancers.
Any ideas what might be causing this?
More information about your specific architecture might be useful, but my first guess is that you should enable session stickiness so that requests from the same host get directed to the same machine on AWS (if the request gets directed to another machine the protocol would have to be renegociated).

Rails 3.0 intermittent Connection timed out, execution expired errors

We're on four Amazon EC2 instances (one load balancer, one db, and two app) and are constantly getting random timeouts. We get at least one a day, sometimes more. Here are some examples:
Errno::ETIMEDOUT: Connection timed out - connect(2)
/usr/local/rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/net/smtp.rb:546:in `initialize'
and
Timeout::Error: execution expired
[GEM_ROOT]/gems/activemodel-3.0.9/lib/active_model/attribute_methods.rb:354:in `match'
I'm not sure how to debug these as they are not related to application code or server load. CPU usage usually hovers below 10% with the biggest spike going up to 60%. The spikes are most likely due to running backups and do not correspond with the times of the timeout errors.
How can these types of errors be tracked down?
The first timeout looks like a legit connection timeout sending mail via SMTP. Are you hosting your own SMTP server or using a service?
Looks like sendgrid has been experiencing delays/timeouts the last couple of days:
We're currently seeing lots of volume in our queues and emails may be delayed for a short period. Stay tuned for updates. #status
Fix for SMTP Service Timeout/Fails
Setup a local mail relay that will hold mail and re-send if there are failures like this. We use a local Postfix relay in production for just this problem (so ActiveMailer uses sendmail to Postfix, which queues up mail and delivers via SMTP relay to Sendgrid).

Resources