MassTransit timeouts under load on .NETFramework under IIS - masstransit

Under load in production we receive "RabbitMQ.Client.Exceptions.ConnectFailureException" connection failed and "MassTransit.RequestTimeoutException" timeout waiting for response. The consumer does receive the message and send it back. It's like the web app isn't listening, or unable to accept the connection.
We're running an ASP.NET web application ( not MVC ) on .NET Framework 4.6.2 on Windows Server 2019 on IIS. We're using MassTransit 7.0.4. In production, under load, we can get some exceptions dealing with sockets on RabbitMQ or timeouts from masstransit. It's difficult to reproduce them in Dev. RabbitMQ is in a mirror, it seems to happen once we turn on a high-load service that bumps from 140 message/sec to 250 message/sec.
I have a few questions about the code architecture, and then if anyone else is running into these kinds of timeout issues.
Questions:
Should I have static scope for the IBusControl? IE, should it be static inside Global asax? And does it matter at all if it's a singleton underneath?
Should I create a new IBusControl and start it per request ( maybe stick it in Application BeginRequest ). Would that make a difference?
Would adding another worker process affect the total number of open connections I'm able to make -- If this is a resource issue ( exhausting threads, connections or some resource ).
Exceptions:
MassTransit.RequestTimeoutException
Timeout Waiting for response
Stacktrace:
System.Runtime.ExceptionServices.ExceptionDispathInfo.Throw
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
MassTransit.Clients.ResponseHandlerConnectionHandle`1+<GetTask>d_11.MoveNext
System.Threading.ExecutionContext.RunInternal
RabbitMQ.Client.Exceptions.ConnectFailureException
Connection failed
Statcktrace:
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail
RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily
RabbitMQ.Client.Impl.SocketFrameHandler..ctor
RabbitMQ.Client.ConnectionFactory.CreateFrameHandler
RabbitMQ.Client.EndPointResolverExtensions.SelectOne
RabbitMQ.Client.ConnectionFactory.CreateConnection
How Our Code Works ( overview )
Static IBusControl that is instantiated the first time someone tries to produce a message. The whole connection and send code is a little large to put in here ( connection factory and other metric classes, but below are the interesting parts ).
Static IBusControl B;
B = Bus.Factory.CreateUsingRabbitMq(x =>
{
hostAddress = host.HostAddress;
x.Host(new Uri(host.HostAddress), h =>
{
h.Username(host.UserName);
h.Password(host.Password);
});
x.Durable = false;
x.SetQueueArgument("x-message-ttl", 600000);
});
B.Start(new TimeSpan(0, 0, 10));
// Then send the Actual Messages
// Generic with TRequest and TResponse : class BaseMessage
// Pulling the code out of a few different classes
string serviceAddressString = string.Format("{0}/{1}?durable={2}", HostAddress, ChkMassTransit.QueueName(typeof(TRequest), typeof(TResponse)), false ? "true" : "false");
Uri serviceAddress = new Uri(serviceAddressString);
RequestTimeout rt = RequestTimeout.After(0, 0, 0, 0, timeout.Value);
IRequestClient<TRequest> reqClient = B.CreateRequestClient<TRequest>(serviceAddress, rt);
var v = reqClient.GetResponse<TResponse>(request, sendInfo.CT, sendInfo.RT);
if ( v.Wait(timeoutMS) ) { /*do some stuff*/ }

First, I find your lack of async disturbing. Using Wait or anything like it on TPL-based code is a recipe for death and destruction, pain and suffering, dogs and cats living together, etc.
Yes, you should have a single bus instance that is started when the application starts. Since you're doing request/response, set AutoStart = true on the bus configurator to make sure it's all warmed up and ready.
Never, no, one bus only!
Each bus instance only has a single connection, so you shouldn't see any resource issues related to capacity on RabbitMQ.
MassTransit 7.0.4 is really old, you might consider the easy upgrade 7.3.1 and see if that improves things for you. It's the last version of the v7 codebase available.

Related

How may client can connect to the server socket in MQL?

I want to connect more than 500 hundred client to the MQL (Meta Trader) server socket.
There is no description about it in the documentation: https://www.mql5.com/en/docs/network/socketcreate
How many client can connect to the sever and deal with no problem?
Q :" I want to connect more than 500 hundred client to the MQL (Meta Trader) server ... How many client can connect to the sever and deal with no problem? "
A :Not an easy task, indeed.
As you may already know, all the MetaTrader 4/5 ecosystems are built as a distributed-system, having a Terminal-side ( on your, clients' side(s) ) and a Server-side ( a multi-host platform, located at the Broker DataCenter, who registers users, authenticates & feeds, besides many further noted things, a latency-sensitive, high-volume ( markets Volume-wise times number of active clients-wise ) stream of { CFD | FX | DeFi | * }-Market QUOTE messages (having easily cadence of hundreds ToB-events / messages per millisecond at FX-market) to all auth'ed active { MT4 | MT5 }-Terminal computers & accepts and executes XTO-instructions from auth'ed clients & reports results ( state-changes preformed & client's-funds accounting operations ) from XTO-s back to the respective trader's terminals ). That amount of work is, on the Broker side, split among several MetaTrader 4/5 Server server-infrastructure computers. The web-socket handling gets served by one part of such Broker-side infrastructure.
Closer to your reach goes the MetaTrader 4/5 Terminal, that you can program & control. Even here the amounts of resources are limited, as you can read from your linked, Terminal-side, not Server-side documentation of programming tools available :
You can create a maximum of 128 sockets from one MQL5 program. If the limit is exceeded, the error 5271 (ERR_NETSOCKET_TOO_MANY_OPENED) is written to _LastError.
So, the Server-side is controlled by the Broker ( who owns the license to use the MetaQuotes, Inc. product, that gets configured for expected performance envelopes - being ready or not to handle additional 50.000 web-socket connections for NTO-s might not be the Brokers' core business priority, as they collect fees from XTO-s )
"(...) The question is, do we create new socket for each client to connect? As I know, we create the server socket just one time on the Oninit function, then on a timer or chart event handler, do accepting incoming client connection request. So, there is just one socket and many client connect to this socket. Am I right #user3666197 ? – Behzad 23 hours ago"
-&-
"I think my question is not clear. I have done this project. I bought a VPS then install a MT5 on it with the EA that has played the server role. The sever EA could accept 500 client without any problem. It can send and receive messages as well as one connection. For clients, on my pc create a loop to connect 500 connection to the server. One socket on the server EA. – Behzad 4 hours ago"
Given you call MT5-Client-Terminal a "server" in a sense ( just a VPS-hosted MT5-Client-Terminal, running a user-defined MQL5-ExpertAdviser-code ), there seems to be some magic :
(A)you claim to be able to "(...) accept 500 client without any problem.", which is in a direct contradiction to the official MQL5-documented limit of not more than 128 sockets ever opened from an MQL5-{ EA | Script }-code
(B) the official MQL5-documentation does not present a way, how an MT5-Client-Terminal running an MQL5-{ EA | Script }-code can receive connections arriving asynchronously from remote clients ( yet without specifying how that might ever happen, as the official MQL5-Documentation is strict on practically avoiding such to happen if using the MQL5-language functions as of 2022-Q1 )
(C) the official MQL5-documentation confirms, one can SocketConnect() from inside an MT5-Client-Terminal MQL5-{ EA | Script }-code to a known TCP/IP:PORT address :
string KNOWN_ADDRESS = "some.known.FQDN";
int KWOWN_PORT = 80,
TimeoutMILLIS = 1000;
bool FLAG_ExtTLS = false;
//+------------------------------------------------------------------+
...
int MyOUTGOINGsocket = SocketCreate(); //--- check the handle
if ( MyOUTGOINGsocket != INVALID_HANDLE )
{
if ( SocketConnect( MyOUTGOINGsocket, //--- from MT5-Terminal
KNOWN_ADDRESS, // to <_address_>
KNOWN_PORT, // on <_port_>
TimeoutMILLIS // try <_millis_>
) // else FAIL
)
{
Print( "INF: Established connection to ",
KNOWN_ADDRESS, ":",
KNOWN_PORT
);
...
}
else
{
Print( "ERR: Connection to ",
KNOWN_ADDRESS, ":",
KNOWN_PORT,
" failed, error ",
GetLastError()
);
...
}
SocketClose( MyOUTGOINGsocket ); //--- close a socket to release RAM/resources
}
else
{ Print( "ERR: Failed to even create a socket, error was ",
GetLastError()
);
...
}
...
...
//+------------------------------------------------------------------+
One may use, for sure some other, DLL-#import-ed tools for the similar tasks, yet as no MCVE-formulated problem description was presented so far, it is so hard to tell anything more, except for the facts already described above
You can using webrequest method with API from mql client
There is an article explaining how to create a server on MT5:
Working with sockets in MQL, or How to become a signal provider
https://www.mql5.com/en/articles/2599

How to set up a ZeroMQ request-reply between a c# and python application

I'm trying to communicate between a c#(5.0) and a python (3.9) application via ZeroMQ. For .Net I'm using NetMQ and for python PyZMQ.
I have no trouble letting two applications communicate, as long as they are in the same language
c# app to c# app;
python -> python;
java -> java,
but trouble starts when I try to connect between different languages.
java -> c# and reverse works fine as well [edited]
I do not get any errors, but it does not work either.
I first tried the PUB-SUB Archetype pattern, but as that didn't work, I tried REQ-REP, so some remainders of the "PUB-SUB"-version can still be found in the code.
My Python code looks like this :
def run(monitor: bool):
loop_counter: int = 0
context = zmq.Context()
# socket = context.socket(zmq.PUB)
# socket.bind("tcp://*:5557")
socket = context.socket(zmq.REP)
socket.connect("tcp://localhost:5557")
if monitor:
print("Connecting")
# 0 = Longest version, 1 = shorter version, 2 = shortest version
length_version: int = 0
print("Ready and waiting for incoming requests ...")
while True:
message = socket.recv()
if monitor:
print("Received message:", message)
if message == "long":
length_version = 0
elif message == "middle":
length_version = 1
else:
length_version = 2
sys_info = get_system_info(length_version)
"""if not length_version == 2:
length_version = 2
loop_counter += 1
if loop_counter == 15:
length_version = 1
if loop_counter > 30:
loop_counter = 0
length_version = 0"""
if monitor:
print(sys_info)
json_string = json.dumps(sys_info)
print(json_string)
socket.send_string(json_string)
My C# code :
static void Main(string[] args)
{
//using (var requestSocket = new RequestSocket(">tcp://localhost:5557"))
using (var requestSocket = new RequestSocket("tcp://localhost:5557"))
{
while (true) {
Console.WriteLine($"Running the server ...");
string msg = "short";
requestSocket.SendFrame(msg);
var message = requestSocket.ReceiveFrameString();
Console.WriteLine($"requestSocket : Received '{message}'");
//Console.ReadLine();
Thread.Sleep(1_000);
}
}
}
Seeing the period of your problems maybe it's because of versions.
I run fine a program for long time with communications from Windows/C# with NTMQ 4.0.0.207 239,829 7/1/2019 on one side and Ubuntu/Python with zeromq=4.3.1 and pyzmq=18.1.0.
I just tried updating to use same NETMQ version but with new versions zeromq=4.3.3 and pyzmq=20.0.0 but there is a problem/bug somewhere and it doesn't run well anymore.
So your code doesn't look bad may be it's software versions issues not doing well try with NTMQ 4.0.0.207 on c# side and zeromq=4.3.1 with pyzmq=18.1.0 on python side
Q : "How to set up a ZeroMQ request-reply between a c# and python application"
The problem starts with the missed understanding of how REQ/REP archetype works.
Your code uses a blocking-form of the .recv()-method, so you remain yourselves hanging Out-of-the-Game, forever & unsalvageable, whenever a REQ/REP two-step gets into troubles (as no due care was taken to prevent this infinite live-lock).
Rather start using .poll()-method to start testing a presence / absence of a message in the local AccessNode-side of the queue and this leaves you in a capability to state-fully decide what to do next, if a message is already or is not yet present, so as to keep the mandatory sequence of an API-defined need to "zip" successful chainings ofREQ-side .send()-.recv()-.send()-.recv()-... with REP-side .recv()-.send()-.recv()-.send()-... calls, are the REQ/REP archetype works as a distributed-Finite-State-Automaton (dFSA), that may easily deadlock itself, due to "remote"-side not being compliant with the local-side expectations.
Having a code, that works in a non-blocking, .poll()-based mode avoids falling into these traps, as you may handle each of these unwanted circumstances while being still in a control of the code-execution paths (which a call to a blocking-mode method in a blind belief it will return at some future point in time, if ever, simply is not capable of).
Q.E.D.
If in doubts, one may use a PUSH/PULL archetype, as the PUB/SUB-archetype may run into problems with non-matching subscriptions ( topic-list management being another, version dependent detail ).
There ought be no other problem for any of the language-bindings, if they passed all the documented ZeroMQ API features without creating any "shortcuts" - some cases were seen, where language-specific binding took "another" direction for PUB/SUB, when sending a pure message, transformed into a multi-part message, putting a topic into a first frame and the message into the other. That is an example of a binding not compatible with the ZeroMQ API, where a cross-language / non-matching binding-version system problems are clear to come.
Your port numbers do not match, the python code is 55557 and the c# is 5557
I might be late, but this same thing happened to me. I have a python Subscriber using pyzmq and a C# Publisher using NetMQ.
After a few hours, it occurred to me that I needed to let the Publisher some time to connect. So a simple System.Threading.Thread.Sleep(500); after the Connect/Bind did the trick.

Performance Azure function with multiple output bindings

Hello all who read this,
We have written a router function on azure in an app plan that receives messages from iothub
and depending the message type we route our message to another eventhub.
Previously we had 6 out bindings to eventhubs in this function
Recently we added 3 more message type so 3 more out binding to 3 more eventhubs
No processing of the messages happen in this function but what we see now is that we spend 16 times more time in the routing function.
Is there a performance issue about having multiple output bindings.
We don't see an increase in load of the incoming messages.
We are running on azure functions 1.0 (Runtime version: 1.0.12205.0 (~1))
Regards Ben
Simplified Sample code of the routing function
public static class IotHubRouterFunction
{
[FunctionName("IotHubRouterFunction")]
public static void Run([EventHubTrigger("%iothub%", Connection = "IothubRouterListen")]EventData myEventHubData,
[EventHub("%msg1-eventhub%", Connection = "msg1event")] ICollector<EventData> eventHub4Dmsg1Event,
[EventHub("%msg2-eventhub%", Connection = "msg2event")] ICollector<EventData> eventHub4Dmsg2Event,
[EventHub("%msg3-eventhub%", Connection = "msg3event")] ICollector<EventData> eventHub4Dmsg3Event,
//... like 6 more bindings like this
ILogger logger
)
{
try
{
var messageType = GetValue(myEventHubData.Properties, "type");
// routing
switch (messageType)
{
case "msg1event":
{
eventHub4DevicesStatusChanged.Add(eventHub4Dmsg1Event);
break;
}
case "msg2event":
{
eventHub4MeasurementLog.Add(eventHub4Dmsg2Event);
break;
}
case "msg3event":
{
eventHub4DeviceDiscovered.Add(eventHub4Dmsg3Event);
break;
}
//6 more cases like this
default:
{
logger.LogError("Unrouteable message of type: {messageType}", messageType);
break;
}
}
}
catch (Exception ex)
{
//removed
}
}
}
With 6 bindings the message fly through the router function at 50ms
With 9 bindings the message crawl through the router function at 800ms
CPU raised with 30% as well on the applan (we scaled extra so we have it under control but why so much what is causing this)
A little late with the follow up of what happened
In the end we found out what was going on
We have several instances of our app plan
but the old monitoring solution showed the average of the cpu and memory overall the instances of the applan.
Basically with switching to the newer metrics and azure monitoring we were able to drill down in the separate instances of the app plan and the instances of the functions.
We found out that one instance of a function which was running three times two of them norammly but the third function had crashed it's internal apppool and consumed all cpu power it got hold off and did absolutely nothing.
We restarted the function and all issues were gone.
Still wondering if it was something in our code that made it go through the roof
or that something happened in azure that made it go crazy.
:-s
When you are using Azure Function under App service plan then you have to watch out for performance parameters like scaling. Have you investigated your function is not getting overloaded ?
On the other hand , As part of your design this approach is wrong to me. With this many bindings there could be potential performance issues , and what if you are supposed to add more bindings in future ? If you are not performing any operation then you shouldn't be taking overhead of redirecting messages.
Event Grid
We can use event grids for that. Based on topic the IoT hub publishes the event to a topic and events are consumed by subscribers in your case other event hubs. You also get advantage of micro billing (serverless) and auto scaling as well. https://learn.microsoft.com/en-us/azure/event-grid/overview

Using zmq_conect a port befor zmq_bind, return suncces

I`m using zero mq 3.2.0 C++ libary. I use zmq_connect to connect a port before zmq_bild. But this function return success. How can I know connect fail? My code is:
void *ctx = zmq_ctx_new(1);
void *skt = zmq_socket(ctx, ZMQ_SUB);
int ret = zmq_connect(skt, "tcp://192.168.9.97:5561"); // 192.168.9.97:5561 is not binded
// zmq_connect return zero
This is actually a feature of zeromq, connection status and so on is abstracted away from you. There is no exposed information you can check to see if you're connected or not AFAIK. This means that you can connect even if the server is temporarily down, and zeromq will handle everything when the server comes available later. This can be both a blessing and a curse.
What most people end up doing if they need to know connection status is to implement some sort of heartbeat. REQ/REP ping/pong for example.
Have a look at the lazy pirate pattern for an example of how to ensure reliability from a client perspective.

Akka Camel - JMS messages lost - should wait for initialization of Camel?

My experimental application is quite simple, trying what can be done with Actors and Akka.
After JVM start, it creates actor system with couple of plain actors, JMS consumer (akka.camel.Consumer) and JMS producer (akka.camel.Producer). It sends couple of messages between actors and also JMS producer -> JMS server -> JMS consumer. It basically talks to itself via JMS service.
From time to time I was experiencing weird behaviour: it seemed that from time to time, first of messages which where supposed to be sent to JMS server was somehow lost. By looking at my application logs, I could see that applications is trying to send the message but it was never received by JMS server. (For each run I have to start JVM&Application again).
Akka Camel Documentation mentions that it's possible that some components may not be fully initialized at the begining: "Some Camel components can take a while to startup, and in some cases you might want to know when the endpoints are activated and ready to be used."
I tried to implement following to wait for Camel initialization
val system = ActorSystem("actor-system")
val camel = CamelExtension(system)
val jmsConsumer = system.actorOf(Props[JMSConsumer])
val activationFuture = camel.activationFutureFor(jmsConsumer)(timeout = 10 seconds, executor = system.dispatcher)
val result = Await.result(activationFuture,10 seconds)
which seems to help with this issue. (Although, when removing this step now, I'm not able to recreate this issue any more... :/).
My question is whether this is correct way to ensure all components are fully initialized?
Should I use
val future = camel.activationFutureFor(actor)(timeout = 10 seconds, executor = system.dispatcher)
Await.result(future, 10 seconds)
for each akka.camel.Producer and akka.camel.Consumer actor to be sure that everything is initialized properly?
Is that all I should to do, or something else should be done as well? Documentation is not clean on that and it's not easy to test as issue was happening only occasionaly...
You need to initialize the camel JMS component and also Producer before sending any messages.
import static java.util.concurrent.TimeUnit.SECONDS;
import scala.concurrent.Future;
import scala.concurrent.duration.Duration;
import akka.dispatch.OnComplete;
ActorRef producer = system.actorOf(new Props(SimpleProducer.class), "simpleproducer");
Timeout timeout = new Timeout(Duration.create(15, SECONDS));
Future<ActorRef> activationFuture = camel.activationFutureFor(producer,timeout, system.dispatcher());
activationFuture.onComplete(new OnComplete<ActorRef>() {
#Override
public void onComplete(Throwable arg0, ActorRef arg1)
throws Throwable {
producer.tell("First!!");
}
},system.dispatcher());

Resources