Is this example tcp socket programming sequence of events safe? - ruby

I plan on having two services.
HTTP REST service written in Ruby
JSON RPC service written in Go
The Ruby service will open a TCP socket connection to a Go JSON RPC service. It'll do this for each incoming HTTP request it receives. It will send some data over the socket to the Go service and that service will subsequently send back the corresponding data back down the socket.
Go code
The Go service go would look something like this (simplified):
srv := new(service.App) // this would expose a Process method
rpc.Register(srv)
listener, err := net.Listen("tcp", ":8080")
if err != nil {
// handle error
}
for {
conn, err := listener.Accept()
if err != nil {
// handle error
}
go jsonrpc.ServeConn(conn)
}
Notice we serve the incoming connection using a goroutine, so we can handle requests concurrently.
Ruby code
Below is a simple snippet of Ruby code that demonstrates (in theory) the way I would send data to the Go service:
require "socket"
require "json"
socket = TCPSocket.new "localhost", "8080"
b = {
:method => "App.Process",
:params => [{ :Config => JSON.generate({ :foo => :bar }) }],
:id => "0"
}
socket.write(JSON.dump(b))
response = JSON.load socket.readline
My concern is: will this be a safe sequence of events?
I'm not asking if this will be 'thread safe', because i'm not worried about manipulating shared memory across the go routines. I'm more concerned around whether my Ruby HTTP service will get back the data it's expecting?
If I have two parallel requests coming into my HTTP Service (or maybe the Ruby app is hosted behind a load balancer and so different instances of the HTTP service is handling multiple requests), then I could have instance A send the message Foo to the Go service; while instance B sends the message Bar.
The business logic inside the Go service will return different responses depending on its input so I want to be sure that Ruby instance A gets back the correct response for Foo, and B gets back the correct response for Bar.
I assume a socket connection is more like a queue in that if instance A makes a request to the Go service first and then B does, but B is quicker responding for whatever reason, then the Go service will write the response for B to the socket and instance A of the Ruby app will end up reading in the wrong socket data (this is obviously just one possible scenario considering that I could get lucky and have instance B read the socket data before instance A does).
Solutions?
I'm not sure if there is simple solution to this problem. Unless I don't use a TCP socket or RPC and instead rely on standard HTTP in the Go service. But I wanted the performance and less overhead of TCP.
I'm worried the design could get more complicated by maybe having to implement an external queue as a way of synchronising the responses with the Ruby service.
It maybe because the nature of my Ruby service is fundamentally synchronous (HTTP response/request) that I have no option but to switch to HTTP for the Go service.
But wanted to double check with the community first just in case I'm missing something obvious.

Yes this is safe if you create a new connection every time.
That said there are latent issues with your approach:
TCP connections are rather expensive to establish, so you probably want to re-use connections with a connection pool
If you make too many simultaneous requests you will exhaust ports/open file descriptors which will cause your program to crash
You don't have any timeouts in place, so it's possible to end up with orphaned TCP connections which never complete (either because of something bad on the Go side, or network problems)
I think you'd be better off using HTTP (despite the overhead) since libraries are already written to cope with these problems. HTTP is also much more debuggable since you can just curl an endpoint to test it.
Personally I'd probably go with gRPC.

Related

How to un-wedge go gRPC bidi-streaming server from the blocking Recv() call?

When serving a bidirectional stream in gRPC in golang, the canonical stream handler looks something like this:
func (s *MyServer) MyBidiRPC(stream somepb.MyServer_MyBidiServer) error {
for {
data, err := stream.Recv()
if err == io.EOF {
return nil // clean close
}
if err != nil {
return err // some other error
}
// do things with data here
}
}
Specifically, when the handler for the bidi RPC returns, that is the signal to consider the server side closed.
This is a synchronous programming model -- the server stays blocked inside this goroutine (created by the grpc library) while waiting for messages from the client.
Now, I would like to unblock this Recv() call (which ends up calling RecvMsg() on an underlying grpc.ServerStream,) and return/close the stream, because the server process has decided that it is done with this client.
Unfortunately, I can find no obvious way to do this:
There's no Close() or CloseSend() or CloseRecv() or Shutdown()-like function on the bidi server interface generated for my service
The context inside the stream, which I can get at with stream.Context(), doesn't expose user-accessible the cancel function
I can't find a way to pass in a context on the "starting side" for a new connection accepted by the grpc.Server, where I could inject my own cancel function
I could close the entire grpc.Server by calling Stop(), but that's not what I want to do -- only this particular client connection (grpc.ServerStream) should be finished.
I could send a message to the client that makes the client in turn shut down the conection. However, this doesn't work if the client has fallen off the network, which would be solved with a timeout, which has to be pretty long to be generally robust. I want it now because I'm impatient, and, more importantly, at scale, dangling unresponsive clients can be a high cost.
I could (perhaps) dig through the grpc.ServerStream with reflection until I find the transportStream, and then dig out the cancel function out of that and call it. Or dig through the stream.Context() with reflection, and make my own cancel function reference to call. Neither of these seem well advised for future maintainers.
But surely these can't be the only options? Deciding that a particular client no longer needs to be connected is not magic space-alien science. How do I close this stream such that the Recv() call un-blocks, from the server process side, without involving a round-trip to the client?
Unfortunately I don't think there is a great way to do what you are asking. Depending on your goal, I think you have two options:
Run Recv in a goroutine and return from the bidi handler when you need it to return. This will close the context and unblock Recv. This is obviously suboptimal, as it requires care because you now have code executing outside the scope of the handler's execution. It is, however, the closest answer I can seem to find.
If you are trying to mitigate the impact of misbehaving clients by instituting timeouts, you might be able to offload the work of this to the framework with KeepaliveEnforcementPolicy and/or KeepaliveParams. This is probably preferable if this aligns with the reason you are hoping to close the connection, but otherwise isn't of much use.

Raise exception when TCP connection broken

I'm building a server which accepts connections through TCP (using TCPServer). I mostly just read data (socket.gets.chomp) and write data (socket.print).
socket.gets will return nil if the connection has been closed by the client in the meantime, so .chomp will raise NoMethodError. This is hard to handle specifically since it's such an unspecific exception - I want to distinguish exceptions caused by the connection loss from other causes of NoMethodError, such as me typoing a method.
Ideally, I would receive something more specific such as SocketError whenever trying to interact with a closed socket, rather than just getting back nil. How could I accomplish that?
I have already considered these options:
Write a wrapper for TCPSocket or IO which checks on socket availability before every call (a lot of work to do cleanly considering how many methods there are in IO)
Check each return value for nil (even more effort and code redundancy as my application grows, also I would still .print to the socket when it's already closed)
Monkey patching NilClass for chomp (again only handles this specific use case, and monkey patching should be avoided for clean code)
Being at end of file is not intrinsically an error, nor is it normally understood to mean a "broken" connection like your title says.
For example, HTTP allows multiple requests to be sent over a single connection. After completely reading a request you can read again, and if the connection is closed you'd get nil, which tells you there are no more requests coming. This situation isn't considered an error condition by most/all HTTP software.
Most Ruby software handles nil return from read as an indication that the network conversation is over (successfully). I suggest you do something like that.
If you wish to consider EOF an error, you could create a wrapper class for IO that would "upgrade" nil return from read into an exception of some kind, but I would suggest rethinking whether this is really what you need.
See also https://ruby-doc.org/core-3.0.0/IO.html#method-i-read.

Difference between NewChannel vs Request in ssh sftp server

I'm looking at go sftp server example code
https://github.com/pkg/sftp/blob/master/examples/go-sftp-server/main.go
There are section of code which are unclear to me
_, chans, reqs, err := ssh.NewServerConn(nConn, config)
if err != nil {
log.Fatal("failed to handshake", err)
}
fmt.Fprintf(debugStream, "SSH server established\n")
// The incoming Request channel must be serviced.
go ssh.DiscardRequests(reqs)
// Service the incoming Channel channel.
for newChannel := range chans {
...
}
First: With ssh.NewServerConn, if NewChannel(chans) represent new request to the channel what is Request reqs. So what is difference between chans and reqs here.
Second: Why is the need to ssh.DiscardRequests(reqs)
Looking at the documentation for ssh.NewServerConn it appears that it returns the following:
*ServerConn
<-chan NewChannel
<-chan *Request
error
The second returned value, NewChannel
represents an incoming request to a channel
The third returned value, Request
is a request sent outside of the normal stream of data
This doesn't really answer your questions but it does provide helpful clues as where to look.
So to answer you questions:
chans receives connections that are new to the server. Using the received value from chans, you can either accept and communicate with that connection or just reject the connection. This can be thought of multiple people logging into a remote machine via ssh and handling multiple sessions.
reqs holds global requests (which is defined here) sent to either the server or client that should not be sent to any specific channel. RFC4254 gives the example of a such a request as "start TCP/IP forwarding for a specific port".
You can see the internal usage of how the ssh package uses the incomingRequests channel here.
The documentation for ssh.NewServerConn explicitly states
The Request and NewChannel channels must be serviced, or the connection will hang.
In the event that this server does receive a global request it needs to be handled appropriately if the request is asking for a reply.
Apart from #will7200 answer I just want to add a couple of things which I found while my reading around this.
SSH has Global request called SSH_MESSAGE_GLOBAL_REQUEST and SSH_MESSAGE_CHANNEL_REQUEST or starts TCP/IP forwarding for a specific port
a channel is any specific terminal or how we see it when we send the data across the ssh server and client.
So reqs over here is the global request and all channel requests are wrapped inside the channel.
GLOBAL requests are requests that are not specific to a CHANNEL like TCPKeepAlive (as mention in ssh_config) or start TCP/IP forwarding for a specific port.
and DisdCardRequest essentially discard those request that does not want a reply

How can I orchestrate concurrent request-response flow?

I'm new to concurrent programming, and have no idea what concepts to start with, so please be gentle.
I am writing a webservice as a front-end to a TCP server. This server listens to the port I give it, and returns the response to the TCP connection for each request.
Here is why I'm writing a web-service front-end for this server:
The server can handle one request at a time, and I'm trying to make it be able to process several inputs concurrently, by launching multiple processes and giving them a different port to listen on. For example, I want to launch 30 instances and tell them to listen on ports 20000-20029.
Our team uses PHP, and PHP does not have the capacity to launch server instances and maintain them concurrently, so I'm trying to write an API they can just send HTTP requests to.
So, here is the structure I have thought of.
I will have a main() function. This function launches the processes concurrently, then starts an HTTP server on port 80 and listens.
I have an http.Handler that adds the content of a request to a channel,.
I will have gorutines, one per server instance, that are in an infinite loop.
The code for the function mentioned in item three would be something like this:
func handleRequest(queue chan string) {
for {
request := <-queue
conn, err := connectToServer()
err = sendRequestToServer(conn)
response, err := readResponseFromServer(conn)
}
}
So, my http.Handler can simply do something like queue<- request to add the request to the queue, and handleRequest, which has blocked, waiting for the channel to have something to get, will simply get the request and continue on. When done, the loop finishes, execution comes back to the request := <-queue, and the same thing continues.
My problem starts in the http.Handler. It makes perfect sense to put requests in a channel, because multiple gorutines are all listening to it. However, how can these gorutines return the result to my http.Handler?
One way is to use a channel, let's call it responseQueue, that all of these gorutines would then write to. The problem is that when a response is added to the channel, I don't know which request it belongs to. In other words, when multiple http.Handlers send requests, each executing handler will not know which response the current message in the channel belongs to.
Is there a best practice, or a pattern, to send data to a gorutine from another gorutine and receive the data back?
Create a per request response channel and include it in the value sent to the worker. The handler receives from the channel. The worker sends the result to the channel.

using Go redis client (Redigo)

I'm using GO redis client redigo to write image to ~20 redis servers.
speed is an important factor here and I'm just sending set commands to the redis so I'm using Send and Flush without calling Receive.
after a few hours I'm getting "connection reset by peer" on the client.
I was wondering, does it have something to do with the fact that I don't call Receive?
maybe my RX queue just getting to its max capacity because I don't empty it with Receive?
Thank you.
An application must call Receive to clear the responses from the server and to check for errors. If the application is not pipelining commands, then it's best to call Do. Do combines Send, Flush and Receive.
If you don't care about errors, then start a goroutine to read the responses:
go func(c redis.Conn) {
for c.Err() == nil {
c.Receive()
}
}()

Resources