Long-running Redshift transaction from Ruby - ruby

I run several sql statements in a transaction using Ruby pg gem. The problem that I bumped in is that connection times out on these queries due to firewall setup. Solution proposed here does not work, because it requires jdbc connections string, and I'm in Ruby (jRuby is not an option). Moving driver program to AWS to remove firewall is not an option either.
The code that I have is along the following lines:
conn = RedshiftHelper.get_redshift_connection
begin
conn.transaction do
# run my queries
end
ensure
conn.flush
conn.finish
end
I'm now looking into PG asynchronous API. I'm wondering if I can use is_busy to prevent firewall from timing out, or something to that effect. I can't find good documentation on the topic though. Appreciate any hints on that.
PS: I have solved this problem for a single query - I can trigger it asynchronously and track its completion using system STV_INFLIGHT Redshift table.Transaction does not work this way as I have to keep connection open.

Ok, I nailed it down. Here are the facts:
Redshift is based on Postgres 8.0. To check that, connect to Redshift instance using psql and see that it says "server version 8.0"
Keepalive requests are specified on the level of tcp socket (link).
Postgres 8.0 does not support keepalive option when specifying a connection string (link to 9.0 release changes, section E.19.3.9.1 on libpq)
PG gem in Ruby is a wrapper about libpq
Based on the facts above, tcp keepalive is not supported by Redshift. However, PG allows you to retrieve a socket that is used in the established connection. This means that even though libpq does not set keepalive feature, we still can use it manually. The solution thus:
class Connection
attr_accessor :socket, :pg_connection
def initialize(conn, socket)
#socket = socket
#pg_connection = conn
end
def method_missing(m, *args, &block)
#pg_connection.send(m, *args, &block)
end
def close
#socket.close
#pg_connection.close
end
def finish
#socket.close
#pg_connection.close
end
end
def get_connection
conn = PGconn.open(...)
socket_descriptor = conn.socket
socket = Socket.for_fd(socket_descriptor)
# Use TCP keep-alive feature
socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, 1)
# Maximum keep-alive probes before asuming the connection is lost
socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPCNT, 5)
# Interval (in seconds) between keep-alive probes
socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPINTVL, 2)
# Maximum idle time (in seconds) before start sending keep-alive probes
socket.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_KEEPIDLE, 2)
socket.autoclose = true
return Connection.new(conn, socket)
end
The reason why I introduce a proxy Connection class is because of Ruby tendency to garbage-collect IO objects (like sockets) when they get out of scope. This means that we now need connection and socket to be in the same scope, which is achieved through this proxy class. My Ruby knowledge is not deep, so there may be a better way to handle the socket object.
This approach works, but I would be happy to learn if there are better/cleaner solutions.

The link you provided has the answer. I think you just want to follow the section at the top, which has settings for 3 different OS'es, pick the one you are running the code on (the client to the Amazon service).
Look in this section: To change TCP/IP timeout settings - this is the OS that your code is running on (i.e. The client for the Amazon Service is your Server probably)
Linux — If your client is running on Linux, run the following command as the root user.
-- details omitted --
Windows — If your client runs on Windows, edit the values for the following registry settings under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
-- details omitted --
Mac — If your client is a Mac, create or modify the /etc/sysctl.conf file with the following values:
-- Details omitted --

Related

Python concurrent requests and mysql connection pool

Earlier I was opening and closing a connection for each request but then I got to know about mysql connection pooling which keeps the connections open for us and we can use them whenever required which ultimately saves time and efficiency. Now my question is, max size of a connection pool is 32 so imagine a situation where more than 32 requests are executing concurrently so in that case it'll raise an error. Currently am doing this
db.py
from mysql.connector import connect, pooling
pool = None
def init():
global pool
pool = pooling.MySQLConnectionPool(pool_name='mypool',pool_size=32,pool_reset_session=True,**dbconfig)
def do_work():
try:
con = pool.get_connection()
except:
con = connect(**dbconfig)
# execute queries here
con.close()
It's checking for an available connection in the pool, if all are busy then will open a new one.
So, is this the best approach to achieve this or is there any better alternative?
Note: This is in a single threaded application running completely using asyncio basically it's a discord bot running on a heavy server
Also what should be an ideal size for a connection pool? Thanks!

Maintaining websocket connections with Kubernetes / multiple pods (Tornado)

I've run into a bit of a constraint with combining my Tornado web server websockets implementation with my Kubernetes deployment (I'm using GCP). I'm running an auto-scaler on my Kubernetes engine that automatically spins up new pods as the load on the server increases - this typically happens for long-running jobs (e.g. training ML models).
The problem is that my pods are keeping their own local state - this can mean that a user connection can exist between client -> pod 1, but the long-running job might be processing on pod 2 and sending updates from there. These updates get lost as the connection does not exist between client and pod 2.
I'm using a relatively boilerplate WS implementation, adding/removing user connections as follows:
class DefaultWebSocket(WebSocketHandler):
user_connections = set()
def check_origin(self, origin):
return True
def open(self):
"""
Establish a websocket connection between client and server
:param self: WebSocket object
"""
DefaultWebSocket.user_connections.add(self)
def on_close(self):
"""
Close websocket connection between client and server
:param self: WebSocket object
"""
DefaultWebSocket.user_connections.remove(self)
Any ideas/thoughts - either high or low-level - on how to solve this problem would be much appreciated! I've looked into stuff like k8scale.io and socket.io, but would prefer a 'native' solution.

Elixir erlang :ftp.send got stuck

I use Erlang ftp lib in my elixir project to send file to ftp server.
I call send function :ftp.send(pid, '#{local_path}', '#{remote_path}') to upload file to ftp server.
Most of the time it uploads files successfully, but it sometimes stuck here, not moving to the next line.
According to the docs it should return :ok or {:error, reason}, but simply stuck at :ftp.send.
Can anyone give me suggestion? I am not familiar with Erlang.
Version: Elixir 1.7.3 (compiled with Erlang/OTP 21)
ftp module has two types of timeout, both set during the initialization of ftp service.
Here is an excerpt from the documentation:
{timeout, Timeout}
Connection time-out. Default is 60000 (milliseconds).
{dtimeout, DTimeout}
Data connect time-out. The time the client waits for the server to connect to the data socket. Default is infinity.
Data connect time-out has a default value of infinity, meaning it’d be hang up if there are some network issues. To overcome the problem, I’d suggest you set this value to somewhat meaningful and handle timeouts in your application appropriately.
{:ok, pid} = :ftp.start_service(
host: '...', timeout: 30_000, dtimeout: 10_000
)
:ftp.send(pid, '#{local_path}', '#{remote_path}')

Keeping TCPSocket open: server or client's responsibility?

I have been reading examples online about Ruby's TCPSocket and TCPServer, but I still don't know and can't find what's the best practice for this. If you have a running TCPServer, and you want to keep the socket open across multiple connections/clients, who should be responsible in keeping them open, the server or the clients?
Let's say that you have a TCPServer running:
server = TCPServer.new(8000)
loop do
client = server.accept
while line = client.gets
# process data from client
end
client.puts "Response from server"
client.close # should server close the socket?
end
And Client:
socket = TCPSocket.new 'localhost', 8000
while line = socket.gets
# process data from server
end
socket.close # should client close the socket here?
All of the examples I have seen have the socket.close at the end, which I would assume is not what I want as that would close the connection. Server and clients should maintain open connection as they will need to send data back and forth.
PS: I'm pretty a noob on networking, so just kindly let me know if my question sounds completely dumb.
The server is usually responsible for keeping the connections open because the client (being the one connecting to the server) can break the connection at anytime.
Servers are usually in charge of everything that the client doesn't care about. A video game doesn't really care about the connection to the server as long as it's there. It just wants its data so it can keep running.

HTTP streaming connection (SSE) client disconnect not detected with Sinatra/Thin on Heroku

I am attempting to deploy a Sinatra streaming SSE response application on the Cedar stack. Unfortunately while it works perfectly in development, once deployed to Heroku the callback or errback never get called when a connection is called, leading to the connection pool getting filled up with stale connections (that never time out because data is still being sent to them on the server side.)
Relvant info from the Heroku documentation:
Long-polling and streaming responses
Cedar supports HTTP 1.1 features such as long-polling and streaming responses. An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated.
If you’re sending a streaming response, such as with server-sent events, you’ll need to detect when the client has hung up, and make sure your app server closes the connection promptly. If the server keeps the connection open for 55 seconds without sending any data, you’ll see a request timeout.
This is exactly what I would like to do -- detect when the client has hung up, and close the connection promptly. However, something about the Heroku routing layer seems to prevent Sinatra from detecting the stream close event as it would normally.
Some sample code that can be used to replicate this:
require 'sinatra/base'
class MyApp < Sinatra::Base
set :path, '/tmp'
set :environment, 'production'
def initialize
#connections = []
EM::next_tick do
EM::add_periodic_timer(1) do
#connections.each do |out|
out << "connections: " << #connections.count << "\n"
end
puts "*** connections: #{#connections.count}"
end
end
end
get '/' do
stream(:keep_open) do |out|
#connections << out
puts "Stream opened from #{request.ip} (now #{#connections.size} open)"
out.callback do
#connections.delete(out)
puts "Stream closed from #{request.ip} (now #{#connections.size} open)"
end
end
end
end
I've put a sample app up at http://obscure-depths-3413.herokuapp.com/ using this code that illustrates the problem. When you connect, the amount of connections will increment, but when you disconnect they never go down. (Full source of demo with Gemfile etc is at https://gist.github.com/mroth/5853993)
I'm at wits end trying to debug this one. Anyone know how to fix it?
P.S. There appears to have been a similar bug in Sinatra but it was fixed a year ago. Also this issue only occurs on production in Heroku, but works fine when run locally.
P.S.2. This occurs when iterating over the connections objects as well, for example adding the following code:
EM::add_periodic_timer(10) do
num_conns = #connections.count
#connections.reject!(&:closed?)
new_conns = #connections.count
diff = num_conns - new_conns
puts "Purged #{diff} connections!" if diff > 0
end
Works great locally, but the connections never appear as closed on Heroku.
An update: after working directly with the Heroku routing team (who are great guys!), this is now fixed in their new routing layer, and should work properly in any platform.
I would do this check by hand sending, in a periodic time, alive signal where the client should respond if the message was received.
Please, look at this simple chat implementation https://gist.github.com/tlewin/5708745 that illustrate this concept.
The application communicates with the client using a simple JSON protocol. When the client receive the alive: true message, the application post back a response and the server store the last communication time.

Resources