How may client can connect to the server socket in MQL? - algorithmic-trading

I want to connect more than 500 hundred client to the MQL (Meta Trader) server socket.
There is no description about it in the documentation: https://www.mql5.com/en/docs/network/socketcreate
How many client can connect to the sever and deal with no problem?

Q :" I want to connect more than 500 hundred client to the MQL (Meta Trader) server ... How many client can connect to the sever and deal with no problem? "
A :Not an easy task, indeed.
As you may already know, all the MetaTrader 4/5 ecosystems are built as a distributed-system, having a Terminal-side ( on your, clients' side(s) ) and a Server-side ( a multi-host platform, located at the Broker DataCenter, who registers users, authenticates & feeds, besides many further noted things, a latency-sensitive, high-volume ( markets Volume-wise times number of active clients-wise ) stream of { CFD | FX | DeFi | * }-Market QUOTE messages (having easily cadence of hundreds ToB-events / messages per millisecond at FX-market) to all auth'ed active { MT4 | MT5 }-Terminal computers & accepts and executes XTO-instructions from auth'ed clients & reports results ( state-changes preformed & client's-funds accounting operations ) from XTO-s back to the respective trader's terminals ). That amount of work is, on the Broker side, split among several MetaTrader 4/5 Server server-infrastructure computers. The web-socket handling gets served by one part of such Broker-side infrastructure.
Closer to your reach goes the MetaTrader 4/5 Terminal, that you can program & control. Even here the amounts of resources are limited, as you can read from your linked, Terminal-side, not Server-side documentation of programming tools available :
You can create a maximum of 128 sockets from one MQL5 program. If the limit is exceeded, the error 5271 (ERR_NETSOCKET_TOO_MANY_OPENED) is written to _LastError.
So, the Server-side is controlled by the Broker ( who owns the license to use the MetaQuotes, Inc. product, that gets configured for expected performance envelopes - being ready or not to handle additional 50.000 web-socket connections for NTO-s might not be the Brokers' core business priority, as they collect fees from XTO-s )
"(...) The question is, do we create new socket for each client to connect? As I know, we create the server socket just one time on the Oninit function, then on a timer or chart event handler, do accepting incoming client connection request. So, there is just one socket and many client connect to this socket. Am I right #user3666197 ? – Behzad 23 hours ago"
-&-
"I think my question is not clear. I have done this project. I bought a VPS then install a MT5 on it with the EA that has played the server role. The sever EA could accept 500 client without any problem. It can send and receive messages as well as one connection. For clients, on my pc create a loop to connect 500 connection to the server. One socket on the server EA. – Behzad 4 hours ago"
Given you call MT5-Client-Terminal a "server" in a sense ( just a VPS-hosted MT5-Client-Terminal, running a user-defined MQL5-ExpertAdviser-code ), there seems to be some magic :
(A)you claim to be able to "(...) accept 500 client without any problem.", which is in a direct contradiction to the official MQL5-documented limit of not more than 128 sockets ever opened from an MQL5-{ EA | Script }-code
(B) the official MQL5-documentation does not present a way, how an MT5-Client-Terminal running an MQL5-{ EA | Script }-code can receive connections arriving asynchronously from remote clients ( yet without specifying how that might ever happen, as the official MQL5-Documentation is strict on practically avoiding such to happen if using the MQL5-language functions as of 2022-Q1 )
(C) the official MQL5-documentation confirms, one can SocketConnect() from inside an MT5-Client-Terminal MQL5-{ EA | Script }-code to a known TCP/IP:PORT address :
string KNOWN_ADDRESS = "some.known.FQDN";
int KWOWN_PORT = 80,
TimeoutMILLIS = 1000;
bool FLAG_ExtTLS = false;
//+------------------------------------------------------------------+
...
int MyOUTGOINGsocket = SocketCreate(); //--- check the handle
if ( MyOUTGOINGsocket != INVALID_HANDLE )
{
if ( SocketConnect( MyOUTGOINGsocket, //--- from MT5-Terminal
KNOWN_ADDRESS, // to <_address_>
KNOWN_PORT, // on <_port_>
TimeoutMILLIS // try <_millis_>
) // else FAIL
)
{
Print( "INF: Established connection to ",
KNOWN_ADDRESS, ":",
KNOWN_PORT
);
...
}
else
{
Print( "ERR: Connection to ",
KNOWN_ADDRESS, ":",
KNOWN_PORT,
" failed, error ",
GetLastError()
);
...
}
SocketClose( MyOUTGOINGsocket ); //--- close a socket to release RAM/resources
}
else
{ Print( "ERR: Failed to even create a socket, error was ",
GetLastError()
);
...
}
...
...
//+------------------------------------------------------------------+
One may use, for sure some other, DLL-#import-ed tools for the similar tasks, yet as no MCVE-formulated problem description was presented so far, it is so hard to tell anything more, except for the facts already described above

You can using webrequest method with API from mql client

There is an article explaining how to create a server on MT5:
Working with sockets in MQL, or How to become a signal provider
https://www.mql5.com/en/articles/2599

Related

MassTransit timeouts under load on .NETFramework under IIS

Under load in production we receive "RabbitMQ.Client.Exceptions.ConnectFailureException" connection failed and "MassTransit.RequestTimeoutException" timeout waiting for response. The consumer does receive the message and send it back. It's like the web app isn't listening, or unable to accept the connection.
We're running an ASP.NET web application ( not MVC ) on .NET Framework 4.6.2 on Windows Server 2019 on IIS. We're using MassTransit 7.0.4. In production, under load, we can get some exceptions dealing with sockets on RabbitMQ or timeouts from masstransit. It's difficult to reproduce them in Dev. RabbitMQ is in a mirror, it seems to happen once we turn on a high-load service that bumps from 140 message/sec to 250 message/sec.
I have a few questions about the code architecture, and then if anyone else is running into these kinds of timeout issues.
Questions:
Should I have static scope for the IBusControl? IE, should it be static inside Global asax? And does it matter at all if it's a singleton underneath?
Should I create a new IBusControl and start it per request ( maybe stick it in Application BeginRequest ). Would that make a difference?
Would adding another worker process affect the total number of open connections I'm able to make -- If this is a resource issue ( exhausting threads, connections or some resource ).
Exceptions:
MassTransit.RequestTimeoutException
Timeout Waiting for response
Stacktrace:
System.Runtime.ExceptionServices.ExceptionDispathInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
MassTransit.Clients.ResponseHandlerConnectionHandle`1+<GetTask>d_11.MoveNext
System.Threading.ExecutionContext.RunInternal
RabbitMQ.Client.Exceptions.ConnectFailureException
Connection failed
Statcktrace:
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily
RabbitMQ.Client.Impl.SocketFrameHandler..ctor
RabbitMQ.Client.ConnectionFactory.CreateFrameHandler
RabbitMQ.Client.EndPointResolverExtensions.SelectOne
RabbitMQ.Client.ConnectionFactory.CreateConnection
How Our Code Works ( overview )
Static IBusControl that is instantiated the first time someone tries to produce a message. The whole connection and send code is a little large to put in here ( connection factory and other metric classes, but below are the interesting parts ).
Static IBusControl B;
B = Bus.Factory.CreateUsingRabbitMq(x =>
{
hostAddress = host.HostAddress;
x.Host(new Uri(host.HostAddress), h =>
{
h.Username(host.UserName);
h.Password(host.Password);
});
x.Durable = false;
x.SetQueueArgument("x-message-ttl", 600000);
});
B.Start(new TimeSpan(0, 0, 10));
// Then send the Actual Messages
// Generic with TRequest and TResponse : class BaseMessage
// Pulling the code out of a few different classes
string serviceAddressString = string.Format("{0}/{1}?durable={2}", HostAddress, ChkMassTransit.QueueName(typeof(TRequest), typeof(TResponse)), false ? "true" : "false");
Uri serviceAddress = new Uri(serviceAddressString);
RequestTimeout rt = RequestTimeout.After(0, 0, 0, 0, timeout.Value);
IRequestClient<TRequest> reqClient = B.CreateRequestClient<TRequest>(serviceAddress, rt);
var v = reqClient.GetResponse<TResponse>(request, sendInfo.CT, sendInfo.RT);
if ( v.Wait(timeoutMS) ) { /*do some stuff*/ }
First, I find your lack of async disturbing. Using Wait or anything like it on TPL-based code is a recipe for death and destruction, pain and suffering, dogs and cats living together, etc.
Yes, you should have a single bus instance that is started when the application starts. Since you're doing request/response, set AutoStart = true on the bus configurator to make sure it's all warmed up and ready.
Never, no, one bus only!
Each bus instance only has a single connection, so you shouldn't see any resource issues related to capacity on RabbitMQ.
MassTransit 7.0.4 is really old, you might consider the easy upgrade 7.3.1 and see if that improves things for you. It's the last version of the v7 codebase available.

How to send byte message with ZeroMQ PUB / SUB setting?

So I'm new to ZeroMQ and I am trying to send byte message with ZeroMQ, using a PUB / SUB setting.
Choice of programming language is not important for this question since I am using ZeroMQ for communication between multiple languages.
Here is my server code in python:
import zmq
import time
port = "5556"
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:%s" % port)
while True:
socket.send(b'\x84\xa5Title\xa2hi\xa1y\xcb\x00\x00\x00\x00\x00\x00\x00\x00\xa1x\xcb#\x1c\x00\x00\x00\x00\x00\x00\xa4Data\x08')
time.sleep(1)
and here is my client code in python:
import zmq
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
total_value = 0
for update_nbr in range (5):
string = socket.recv()
print (string)
My client simply blocks at string = socket.recv().
I have done some study, so apparently, if I were to send string using PUB / SUB setting, I need to set some "topic filter" in order to make it work. But I am not sure how to do it if I were to send some byte message.
ZeroMQ defines protocols, that guarantee cross-platform compatibility of both the behaviours and the message-content .
The root-cause:to start receiving messagesone must change the initial "topic-filter" state for the SUB-socket( which initially is "receive nothing" Zero-subscription )
ZeroMQ is a lovely set of tools, created around smart principles.
One of these says, do nothing on SUB-side, until .setsockopt( zmq.SUBSCRIBE, ... ) explicitly says, what to subscribe to, to start checking the incoming messages ( older zmq-fans remember the initial design, where PUB-side always distributes all messages down the road, towards each connected SUB-"radio-broadcast receivers", where upon each message receival the SUB-side performs "topic-filtering" on it's own. Newer versions of zmq reverse the architecture and perform PUB-side filtering ).
Anyway, the inital-state of the "topic-filter" makes sense. Who knows what ought be received a-priori? Nobody. So receive nothing.
Given you need or wish to start the job, an easy move to subscribe to anything ... let's any message get through.
Yes, that simple .setsockopt( zmq.SUBSCRIBE, "" )
If one needs some key-based processing and the messages are of a reasonable size ( no giga-BLOBs ) one may just simply prefix some key ( or a byte-field if more hacky ) in front of the message-string ( or a payload byte-field ).
Sure, one may save some fraction of the transport-layer overhead in case the zmq-filtering is performed on the PUB-side one ( not valid for the older API versions ), otherwise there is typically not big deal to subscribe to receive "anything" and check the messages for some pre-assembled context-key ( a prefix substring, a byte-field etc ) before the rest of the message payload is being processed.
The Best Next Step:
If your code strives to go into production state, not to remain as just an academia example, there will have to be much more work to be done, to provide surviveability measures for the hostile real-world production environments.
An absolutely great perspective for doing this and a good read for realistic designs with ZeroMQ is Pieter HINTJEN's book "Code Connected, Vol.1" ( may check my posts on ZeroMQ to find the book's direct pdf-link ).
Plus another good read comes from Martin SUSTRIK, the co-father of ZeroMQ, on low-level truths about the ZeroMQ implementation details & scale-ability

How to connect queues to a ZeroMQ PUB/SUB

Consider the following:
a set of 3 logical services: S1, S2 and S3
two instances of each service are running, so we have the following processes: S1P1, S1P2, S2P1, S2P2, S3P1, S3P2
a ZeroMQ broker running in a single process and reachable by all service processes
A logical service, let's say S1, publishes a message M1 that is of interest to logical services S2 and S3. Only one process of each logical service must receive M1, so let's say S2P1 and S3P2.
I have tried the following, but without success:
broker thread 1 is running a XSUB/XPUB proxy
broker thread 2 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S1)
broker thread 3 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S2)
broker thread 4 is running a ROUTER/DEALER proxy with the ROUTER connected to the XPUB socket and subscribed to everything (for logical S3)
each logical service process is running a REP socket thread connected to the broker DEALER socket
I figured that the XSUB/XPUB proxy would give me publish/subscribe semantics and that the ROUTER/DEALER proxies would introduce a competition between the REP sockets for the messages sent by the XSUB/XPUB proxy.
How can I combine ZeroMQ sockets to accomplish this?
Update1
I know "without success" isn't helpful, I've tried different configurations and got different errors. The latest configuration I tried is the following:
(XSUB proxy=> XPUB) => (SUB copyLoop=> REQ) => (ROUTER proxy=> DEALER) => REP
The copyLoop goes like this:
public void start() {
context = ZMQ.context(1);
subSocket = context.socket(ZMQ.SUB);
subSocket.connect(subSocketUrl);
subSocket.subscribe("".getBytes());
reqSocket = context.socket(ZMQ.REQ);
reqSocket.connect(reqSocketUrl);
while (!Thread.currentThread().isInterrupted()) {
final Message msg = receiveNextMessage();
resendMessage(msg);
}
}
private Message receiveNextMessage() {
final String header = subSocket.recvStr();
final String entity = subSocket.recvStr();
return new Message(header, entity);
}
private void resendMessage(Message msg) {
reqSocket.sendMore(msg.getKey());
reqSocket.send(msg.getData(), 0);
}
The exception I get is the following:
java.lang.IllegalStateException: Cannot send another request
at zmq.Req.xsend(Req.java:51) ~[jeromq-0.3.4.jar:na]
at zmq.SocketBase.send(SocketBase.java:613) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.send(ZMQ.java:1206) ~[jeromq-0.3.4.jar:na]
at org.zeromq.ZMQ$Socket.sendMore(ZMQ.java:1189) ~[jeromq-0.3.4.jar:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.resendMessage(SubReqProxyConnector.java:47) ~[classes/:na]
at com.xyz.messaging.zeromq.SubReqProxyConnector.start(SubReqProxyConnector.java:35) ~[classes/:na]
I'm running JeroMQ 0.3.4, Oracle Java 8 JVM and Windows 7.
You seem to be adding in some complexity with your ROUTER connection - you should be able to do everything connected directly to your publisher.
The error you're currently running into is that REQ sockets have a strict message ordering pattern - you are not allowed to send() twice in a row, you must send/receive/send/receive/etc (likewise, REP sockets must receive/send/receive/send/etc). From what it looks like, you're just doing send/send/send/etc on your REQ socket without ever receiving a response. If you don't care about a response from your peer, then you must receive and discard it or use DEALER (or ROUTER, but DEALER makes more sense in your current diagram).
I've created a diagram of how I would accomplish this architecture below - using your basic process structure.
Broker T1 Broker T2 Broker T3 Broker T4
(PUB*)------>(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*) -->(*SUB)[--](DEALER*)
|_____________________||____| || | ||
|_____________________||_______________________||____| ||
|| || ||
========================|| ==================|| ===========||=
|| || || || || ||
|| || || || || ||
|| || || || || ||
(REP*) (REP*) (REP*) (REP*) (REP*) (REP*)
S1P1 S1P2 S2P1 S2P2 S3P1 S3P2
So, the main difference is that I've ditched your (SUB copyLoop=> REQ) step. Whether you choose XPUB/XSUB vs PUB/SUB is up to you, but I would tend to start simpler unless you currently want to make use of the extra features of XPUB/XSUB.
Obviously this diagram doesn't deal with how information enters your broker, where you currently show an XSUB socket - that's out of scope for the information you've provided thus far, presumably you're able to receive information into your broker successfully already so I won't deal with that.
I assume your broker threads that are dedicated to each service are making intelligent choices on whether to send the message to their service or not? If so, then your choice of having them subscribed to everything should work fine, otherwise more intelligent subscription setups might be necessary.
If you're using a REP socket on your service processes, then the service process must take that message and deal with it asynchronously, never communicating back any details about that message to the broker. It must then respond to each message with an acknowledgement (like "RECEIVED") so that it follows the strict receive/send/receive/send pattern for REP sockets.
If you want any other type of communication about how the service handles that message sent back to the broker, REP is no longer the appropriate socket type for your service processes, and DEALER may no longer be the correct socket type for your broker. If you want some form of load balancing so that you send to the next open service process, you'll need to use ROUTER/REQ and have each service indicate its availability and have the broker hold on to the message until the next service process says its available by sending results back. If you want some other type of message handling, you'll have to indicate what that is so a suitable architecture can be proposed.
Clearly I got mixed up with a few elements:
Sockets have the same API whether you're using it as a client-side socket (Socket.connect) or a server-side socket (Socket.bind)
Sockets have the same API regardless of the type (e.g. Socket.subscribe should not be called on a PUSH socket)
Some socket types require a send/receive response loop (e.g. REQ/REP)
Some nuances in communication patterns (PUSH/PULL vs ROUTER/DEALER)
The difficulty (impossiblity?) in debugging a ZeroMQ setup
So a big thanks to Jason for his incredibly detailed answer (and awesome diagram!) that pointed me to the right direction.
I ended up with the following design:
broker thread 1 is running a fan-out XSUB/XPUB proxy on bind(localhost:6000) and bind(localhost:6001)
broker thread 2 is running a queuing SUB/PUSH proxy on connect(localhost:6001) and bind(localhost:6002); broker threads 3 and 4 use a similar design with different bind port numbers
message producers connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker queuing proxy using a PULL socket on connect(localhost:6002)
On top of this service-specific queuing mechanism, I was able to add a similar service-specific fan-out mechanism rather simply:
broker thread runs a SUB/PUB proxy on connect(localhost:6001) and bind(localhost:6003)
message producers still connect to the broker using a PUB socket on connect(localhost:6000)
message consumers connect to the broker fan-out proxy using a SUB socket on connect(localhost:6003)
This has been an interesting ride.

How does ZeroMQ connect and bind work internally

I am experimenting with ZeroMQ. And I found it really interesting that in ZeroMQ, it does not matter whether either connect or bind happens first. I tried looking into the source code of ZeroMQ but it was too big to find anything.
The code is as follows.
# client side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.connect('tcp://*:2345') # line [1]
# make it wait here
# server side
import zmq
ctx = zmq.Context()
socket = ctx.socket(zmq.PAIR)
socket.bind('tcp://localhost:2345')
# make it wait here
If I start client side first, the server has not been started yet, but magically the code is not blocked at line [1]. At this point, I checked with ss and made sure that the client is not listening on any port. Nor does it have any open connection. Then I start the server. Now the server is listening on port 2345, and magically the client is connected to it. My question is how does the client know the server is now online?
The best place to ask your question is the ZMQ mailing list, as many of the developers (and founders!) of the library are active there and can answer your question directly, but I'll give it a try. I'll admit that I'm not a C developer so my understanding of the source is limited, but here's what I gather, mostly from src/tcp_connector.cpp (other transports are covered in their respective files and may behave differently).
Line 214 starts the open() method, and here looks to be the meat of what's going on.
To answer your question about why the code is not blocked at Line [1], see line 258. It's specifically calling a method to make the socket behave asynchronously (for specifics on how unblock_socket() works you'll have to talk to someone more versed in C, it's defined here).
On line 278, it attempts to make the connection to the remote peer. If it's successful immediately, you're good, the bound socket was there and we've connected. If it wasn't, on line 294 it sets the error code to EINPROGRESS and fails.
To see what happens then, we go back to the start_connecting() method on line 161. This is where the open() method is called from, and where the EINPROGRESS error is used. My best understanding of what's happening here is that if at first it does not succeed, it tries again, asynchronously, until it finds its peer.
I think the best answer is in zeromq wiki
When should I use bind and when connect?
As a very general advice: use bind on the most stable points in your architecture and connect from the more volatile endpoints. For request/reply the service provider might be point where you bind and the client uses connect. Like plain old TCP.
If you can't figure out which parts are more stable (i.e. peer-to-peer) think about a stable device in the middle, where boths sides can connect to.
The question of bind or connect is often overemphasized. It's really just a matter of what the endpoints do and if they live long — or not. And this depends on your architecture. So build your architecture to fit your problem, not to fit the tool.
And
Why do I see different behavior when I bind a socket versus connect a socket?
ZeroMQ creates queues per underlying connection, e.g. if your socket is connected to 3 peer sockets there are 3 messages queues.
With bind, you allow peers to connect to you, thus you don't know how many peers there will be in the future and you cannot create the queues in advance. Instead, queues are created as individual peers connect to the bound socket.
With connect, ZeroMQ knows that there's going to be at least a single peer and thus it can create a single queue immediately. This applies to all socket types except ROUTER, where queues are only created after the peer we connect to has acknowledge our connection.
Consequently, when sending a message to bound socket with no peers, or a ROUTER with no live connections, there's no queue to store the message to.
When you call socket.connect('tcp://*:2345') or socket.bind('tcp://localhost:2345') you are not calling these methods directly on an underlying TCP socket. All of ZMQ's IO - including connecting/binding underlying TCP sockets - happens in threads that are abstracted away from the user.
When these methods are called on a ZMQ socket it essentially queues these events within the IO threads. Once the IO threads begin to process them they will not return an error unless the event is truly impossible, otherwise they will continually attempt to connect/reconnect.
This means that a ZMQ socket may return without an error even if socket.connect is not successful. In your example it would likely fail without error but then quickly reattempt and succeeded if you were to run the server side of script.
It may also allow you to send messages while in this state (depending on the state of the queue in this situation, rather than the state of the network) and will then attempt to transmit queued messages once the IO threads are able to successfully connect. This also includes if a working TCP connection is later lost. The queues may continue to accept messages for the unconnected socket while IO attempts to automatically resolve the lost connection in the background. If the endpoint takes a while to come back online it should still receive it's messages.
To better explain here's another example
<?php
$pid = pcntl_fork();
if($pid)
{
$context = new ZMQContext();
$client = new ZMQSocket($context, ZMQ::SOCKET_REQ);
try
{
$client->connect("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$client->send("request");
$msg = $client->recv();
var_dump($msg);
}else
{
// in spawned process
echo "waiting 2 seconds\n";
sleep(2);
$context = new ZMQContext();
$server = new ZMQSocket($context, ZMQ::SOCKET_REP);
try
{
$server->bind("tcp://0.0.0.0:9000");
}catch (ZMQSocketException $e)
{
var_dump($e);
}
$msg = $server->recv();
$server->send("response");
var_dump($msg);
}
The binding process will not begin until 2 seconds later than the connecting process. But once the child process wakes and successfully binds the req/rep transaction will successfully take place without error.
jason#jason-VirtualBox:~/php-dev$ php play.php
waiting 2 seconds
string(7) "request"
string(8) "response"
If I was to replace tcp://0.0.0.0:9000 on the binding socket with tcp://0.0.0.0:2345 it will hang because the client is trying to connect to tcp://0.0.0.0:9000, yet still without error.
But if I replace both with tcp://localhost:2345 I get an error on my system because it can't bind on localhost making the call truly impossible.
object(ZMQSocketException)#3 (7) {
["message":protected]=>
string(38) "Failed to bind the ZMQ: No such device"
["string":"Exception":private]=>
string(0) ""
["code":protected]=>
int(19)
["file":protected]=>
string(28) "/home/jason/php-dev/play.php"
["line":protected]=>
int(40)
["trace":"Exception":private]=>
array(1) {
[0]=>
array(6) {
["file"]=>
string(28) "/home/jason/php-dev/play.php"
["line"]=>
int(40)
["function"]=>
string(4) "bind"
["class"]=>
string(9) "ZMQSocket"
["type"]=>
string(2) "->"
["args"]=>
array(1) {
[0]=>
string(20) "tcp://localhost:2345"
}
}
}
["previous":"Exception":private]=>
NULL
}
If your needing real-time information for the state of underlying sockets you should look into socket monitors. Using socket monitors along with the ZMQ poll allows you to poll for both socket events and queue events.
Keep in mind that polling a monitor socket using ZMQ poll is not similar to polling a ZMQ_FD resource via select, epoll, etc. The ZMQ_FD is edge triggered and therefor doesn't behave the way you would expect when polling network resources, where a monitor socket within ZMQ poll is level triggered. Also, monitor sockets are very light weight and latency between the system event and the resulting monitor event is typically sub microsecond.

TCP socket stops receiving data until closed

I have a really weird problem that is driving me crazy.
I have a Ruby server and a Flash client (Action Script 3). It's a multiplayer game.
The problem is that everything is working perfect and then, suddenly, a random player stops receiving data. When the server closes the connection because of inactivity, about 20-60 seconds later, the client receives all the buffered data.
The client uses XMLsocket for retrieving data, so the way the client receives data is not the problem.
socket.addEventListener(Event.CONNECT, connectHandler);
function connectHandler(event)
{
sendData(sess);
}
function sendData(dat)
{
trace("SEND: " + dat);
addDebugData("SEND: " + dat)
if (socket.connected) {
socket.send(dat);
} else {
addDebugData("SOCKET NOT CONNECTED")
}
}
socket.addEventListener(DataEvent.DATA, dataHandler);
function dataHandler(e:DataEvent) {
var data:String = e.data;
workData(data);
}
The server flushes data after every write, so is not a flushing problem:
sock.write(data + DATAEOF)
sock.flush()
DATAEOF is null char, so the client parses the string.
When the server accepts a new socket, it sets sync to true, to autoflush, and TCP_NODELAY to true too:
newsock = serverSocket.accept
newsock.sync = true
newsock.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, true)
This is my research:
Info: I was dumping netstat data to a file each second.
When the client stops receiving data, netstat shows that socket status is still ESTABLISHED.
Some seconds after that, send-queue grows accordingly to data sent.
tcpflow shows that packets are sent 2 times.
When the server closes the socket, socket status changes to FIN_WAIT1, as expected. Then, tcpflow shows that all buffered data is sent to the client, but the client don't receives data. some seconds after that, connection dissapears from netstat and tcpflow shows that the same data is sent again, but this time the client receives the data so starts sending data to the server and the server receives it. But it's too late... server has closed connection.
I don't think it's an OS/network problem, because I've changed from a VPS located in Spain to Amazon EC2 located in Ireland and the problem still remains.
I don't think it's a client network problem too, because this occurs dozens of times per day, and the average quantity of online users is about 45-55, with about 400 unique users a day, so the ratio is extremely high.
EDIT:
I've done more research. I've changed the server to C++.
When a client stops sending data, after a while the server receives a "Connection reset by peer" error. In that moment, tcpdump shows me that the client sent a RST packet, this could be because the client closed the connection and the server tried to read, but... why the client closed the connection? I think the answer is that the client is not the one closing the connection, is the kernel. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.
I'm going to add this as an answer so people know a bit mroe about this.
I think the answer is that the kernel is the one closing the connection. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.

Resources