MSMQ - performance test / stress load on server with few workstations - performance

I am trying to run some MSMQ performance test on a Win2008r2 server. Ideally, I would like to simulate several thousands of workstations sending (each of them) 5 msgs/sec.
One way to do so is to work with Amazon but I am wondering if this could be done in other ways.
I taught that using a custom tool which sends a large number of msgs on a single workstation could do the job but it seems that they are some internal mechanisms which affects a true real life representation of what I am trying to simulate. I can send 2000msgs/sec on a workstation but because of the OUTGOING queue and other mechanism, the server seems to swallow the whole things in large chunks of data (and I am noticing only, at best, 50msgs/sec peaks)
I believe there must be some kind of overhead operation from the workstation before sending data to server which I loose by simulating only a single workstation (or a few more).
Any ideas ?
P.S. I am using a private transactional queue. Win7 on workstation. MSMQ 5

Simulating throughput is easy but it is incredibly hard to simulate multiple MSMQ clients. Each client has a unique IP address and it's own client-queue-manager-to-server-queue-manager network connection. Using Amazon to generate a large number of instances of a Windows client would do the trick but I haven't seen any solution that works on standard PCs.
The overhead you can't simulate with just sending lots of messages is the kernel memory used by the network connection and the threads used to handle incoming messages. Network connections are very expensive and eventually the server will fail if you have too many simultaneously connected clients.
As you are continuously sending messages, each client will have a persistent connection to the server. This is good for speed but bad for memory/thread usage. 5,000 clients will require a powerful 64-bit server.
So, what's the limit on connections to an MSMQ server?
"What are MSMQ's limits?" If I had a farthing for every time...
Insufficient Resources? Run away, run away!
FIX: Kernel-pool memory may become exhausted when many clients connect to Message Queuing

Related

Whats the maximum throughput per instance can be achieved in IBM MQ Advanced for Developers?

I am currently using a IBM MQ Advanced for Developers server for testing our client and was able to achieve around 1000 messages per second using the sample consumer written in jms, which seems to be pretty slow. Is this a limit for dev server, and if yes that what throughput can be achieved using a licensed production IBM MQ server.
There is no artificial limit associated with IBM MQ Advanced for Developers. It is the same as the licensed production version of IBM MQ.
You don't say what type of machine you were using, what persistence your messages were, what size they were, or any other qualifying criteria.
You say client, but I don't know whether you mean "network attached application" or "driving application". Clearly if your program is running "client-attached" (MQ parlance for network attached), then the network performance will also come into this.
On my Windows laptop, I get 4500 non-persistent msgs/sec, or 2000 persistent msgs/sec using a simple C-language locally bound program. Over client connection (just using localhost, not actually going out over a real network connection) I get 2700 non-persistent msgs/sec, or 1500 persistent msgs/sec.
You should read the MQ Performance Reports for details of the expected rates you can get.
As an ex MQ performance person I would say - it depends.
At one level you can ask - what can one application in isolation process.
For persistent messages this will come down to the rate at which you can write to the log files.
If you have 10 applications in parallel each putting and getting from their own queue, then you will not get 10 times the throughput - you might get 8 or 9 times the throughput.
If they are all processing the same queue, then the throughput may drop a bit more as the queue usage is serialised.
If only one application is writing to the log, the application may see 1 millisecond response time. If you have 10 applications running concurrently, they may see a 3 milliseconds response time - so individual throughput goes down, but with more threads, the overall throughput goes up.
If you have requests coming in over the network, you need to add network time, but you can run more clients and so get improved throughput.
If your application has a delay built in - it may only process a low message rate. You can have lots (1000s) of these and get a high >overall< throughput.
If your application is putting and getting as fast as possible, you may find that you can run 10-100 instances before the throughput plateaus.
Let's say you want to run you box so it is using 75% of the CPU, and the logging is 50% busy.
If you have just MQ on the box, then this can run more messages than if you had DB2 on the box (with DB2 using 50% of the CPU)
If you have an application (DB2) hammering the disk, then the MQ throughput will go down.
If you have lots of applications putting to a server queue - and one server program, you will find the throughput is limited by the rate at which the server can process work. If it is doing DB2 work, it will be slower than no DB2 work. If you find the server queue depth is over 5 then you need more server instances.
As Morag said, see the performance reports, but they are not the clearest reports to understand.

It is not possible to download large files at Jetty server

I made a few test downloads using the Jetty 9 server, where it is made multiple downloads of a single file with an approximate size of 80 MB. When smaller number of downloads and the time of 55 seconds is not reached to download, all usually end, however if any downloads in progress after 55 seconds the flow of the network simply to download and no more remains.
I tried already set the timeout and the buffer Jetty, though this has not worked. Has anyone had this problem or have any suggestions on how to solve? Tests on IIS and Apache Server work very well. Use JMeter for testing.
Marcus, maybe you are just hitting Jetty bug 472621?
Edit: The mentioned bug is a separate timeout in Jetty that applies to the total operation, not just idle time. So by setting the http.timeout property you essentially define a maximum time any download is allowed to take, which in turn may cause timeout errors for slow clients and/or large downloads.
Cheers,
momo
A timeout means your client isn't reading fast enough.
JMeter isn't reading the response data fast enough, so the connection sits idle long enough that it idle times out and disconnects.
We test with 800MB and 2GB files regularly.
On using HTTP/1.0, HTTP/1.1, and HTTP/2 protocols.
Using normal (plaintext) connections, and secured TLS connections.
With responses being delivered in as many Transfer-Encodings and Content-Encodings as we can think of (compressed, gzip, chunked, ranged, etc.).
We do all of these tests using our own test infrastructure, often spinning up many many Amazon EC2 nodes to perform a load test that can sufficiently test the server demands (a typical test is 20 client nodes to 1 server node)
When testing large responses, you'll need to be aware of the protocol (HTTP/1.x vs HTTP/2) and how persistence behavior of that protocol can change the request / response latency. In the real world you wont have multiple large requests after each other on the same persisted connection via HTTP/1 (on HTTP/2 the multiple requests would be parallel and be sent at the same time).
Be sure you setup your JMeter to use HTTP/1.1 and not use persisted connections. (see JMeter documentation for help on that)
Also be aware of your bandwidth for your testing, its very common to blame a server (any server) for not performing fast enough, when the test itself is sloppily setup and has expectations that far exceed the bandwidth of the network itself.
Next, don't test with the same machine, this sort of load test would need multiple machines (1 for the server, and 4+ for the client)
Lastly, when load testing, you'll want to become intimately aware of your networking configurations on your server (and to a lesser extent, your client test machines) to maximize your network configuration for high load. Default configurations for OS's are rarely sufficient to handle proper load testing.

How to efficiently handle thousands of keep alive connections in Go?

Using golang's net/http server to handle connections, is there a pattern to better handle 10,000 keep alive connections with relatively low requests per second each?
my benchmark performance with something like Wrk is 50,000 requests per second, and with real traffic (from realtime bidding exchanges) I have a hard time beating 8,000 requests per second.
I know connection multiplexing from a hardware loadbalancer is possible, but it seems like the same type of pattern can be achieved in Go.
You can distribute load on local and remote servers using an IPC protocol like JSON RPC through e.g. UNIX and TCP sockets.
Related: Go Inter-Process Communication
As to the performance bottleneck; it has been discussed extensively on the go-nuts mailing list. At the time of writing it is the runtime's goroutine scheduler and world-stopping garbage collector.
The core team has recently made major improvements to the runtime to alleviate this problem yet there still is room for improvement. To quote one example:
Due to tighter coupling of the run-time and network libraries, fewer context switches are required on network operations.

Is there a way asterisk reconnect calls when internet connection is missed

For being specific, I am using asterisk with a Heartbeat active/pasive cluster. There are 2 nodes in the cluster. Let's suppose Asterisk1 Asterisk2. Eveything is well configured in my cluster. When one of the nodes looses internet connection, asterisk service fails or the Asterisk1 is turned off, the asterisk service and the failover IP migrate to the surviving node (Asterisk2).
The problem is if we actually were processing a call when the Asterisk1 fell down asterisk stops the call and I can redial until asterisk service is up in asterisk2 (5 seconds, not a bad time).
But, my question is: Is there a way to make asterisk work like skype when it looses connection in a call? I mean, not stopping the call and try to reconnect the call, and reconnect it when asterisk service is up in Asterisk2?
There are some commercial systems that support such behavour.
If you want do it on non-comercial system there are 2 way:
1) Force call back to all phones with autoanswer flag. Requerment: Guru in asterisk.
2) Use xen and memory mapping/mirror system to maintain on other node vps with same memory state(same running asterisk). Requirment: guru in XEN. See for example this: http://adrianotto.com/2009/11/remus-project-full-memory-mirroring/
Sorry, both methods require guru knowledge level.
Note, if you do sip via openvpn tunnel, very likly you not loose calls inside tunnel if internet go down for upto 20 sec. That is not exactly what you asked, but can work.
Since there is no accepted answer after almost 2 years I'll provide one: NO. Here's why.
If you failover from one Asterisk server 1 to Asterisk server 2, then Asterisk server 2 has no idea what calls (i.e. endpoint to endpoing) were in progress. (Even if you share a database of called numbers, use asterisk realtime, etc). If asterisk tried to bring up both legs of the call to the same numbers, these might not be the same endpoints of the call.
Another server cannot resume the SIP TCP session of the other server since it closed with the last server.
The MAC source/destination ports may be identical and your firewall will not know you are trying to continue the same session.
etc.....
If you goal is high availability of phone services take a look at the VoIP Info web site. All the rest (network redundancy, disk redundancy, shared block storage devices, router failover protocol, etc) is a distraction...focus instead on early DETECTION of failures across all trunks/routes/devices involved with providing phone service, and then providing the highest degree of recovery without sharing ANY DEVICES. (Too many HA solutions share a disk, channel bank, etc. that create a single point of failure)
Your solution would require a shared database that is updated in realtime on both servers. The database would be managed by an event logger that would keep track of all calls in progress; flagged as LINEUP perhaps. In the event a failure was detected, then all calls that were on the failed server would be flagged as DROPPEDCALL. When your fail-over server spins up and takes over -- using heartbeat monitoring or somesuch -- then the first thing it would do is generate a set of call files of all database records flagged as DROPPPEDCALL. These calls can then be conferenced together.
The hardest part about it is the event monitor, ensuring that you don't miss any RING or HANGUP events, potentially leaving a "ghost" call in the system to be erroneously dialed in a recovery operation.
You likely should also have a mechanism to build your Asterisk config on a "management" machine that then pushes changes out to your farm of call-manager AST boxen. That way any node is replaceable with any other.
What you should likely have is 2 DB servers using replication techniques and Linux High-Availability (LHA) (1). Alternately, DNS round-robin or load-balancing with a "public" IP would do well, too. These machine will likely be light enough load to host your configuration manager as well, with the benefit of getting LHA for "free".
Then, at least N+1 AST Boxen for call handling. N is the number of calls you plan on handling per second divided by 300. The "+1" is your fail-over node. Using node-polling, you can then set up a mechanism where the fail-over node adopts the identity of the failed machine by pulling the correct configuration from the config manager.
If hardware is cheap/free, then 1:1 LHA node redundancy is always an option. However, generally speaking, your failure rate for PC hardware and Asterisk software is fairly lower; 3 or 4 "9s" out of the can. So, really, you're trying to get last bit of distance to the "5th 9".
I hope that gives you some ideas about which way to go. Let me know if you have any questions, and please take the time to "accept" which ever answer does what you need.
(1) http://www.linuxjournal.com/content/ahead-pack-pacemaker-high-availability-stack

How slow are TCP sockets compared to named pipes on Windows for localhost IPC?

I am developing a TCP Proxy to be put in front of a TCP service that should handle between 500 and 1000 active connections from the wild Internet.
The proxy is running on the same machine as the service, and is mostly-transparent. The service is for the most part unaware of the proxy, the only exception being the notification of the real remote IP address of the clients.
This means that, for every inbound open TCP socket, there are two more sockets on the server: the secondth of the pair in the Proxy, and the one on the real service behind the proxy.
The send and recv window sizes on the two Proxy sockets are set to 1024 bytes.
What are the performance implications on this? How slow is this configuration? Should I put some effort on changing the service to use Named Pipes (or other IPC mechanism), or a localhost TCP socket is for the most part an efficient IPC?
The merge of the two apps is not an option. Right now we are stuck with the two process configuration.
EDIT: The reason for having two separate process on the same hardware is 100% economics. We have one server only, and we are not planning on getting more (no money).
The TCP service is a legacy software in Visual Basic 6 which grew beyond our expectations. The proxy is C++. We don't have the time, money nor manpower to rewrite and migrate the VB6 code to a modern programming environment.
The proxy is our attempt to mitigate a specific performance issue on the service, a DDoS attack we are getting from time to time.
The proxy is open source, and here is the project source code.
It will be the same (or at least not measurably different). Winsock is smart enough to know if it's talking to a socket on the same host and, in that case, it will short-circuit pretty much everything below IP and copy data directly buffer-to-buffer. In terms of named pipes vs. sockets, if you need to potentially be able to communicate to different machines ever in the future, choose sockets. If you know for a fact that you'll never need to do that, pick whichever one your developers are most familiar or most comfortable with.
For anyone that comes to read this later, I want to add some findings that answer the original question.
For a utility we are developing we have a networking class that can use named pipes, or TCP with the same calls.
Here is a typical loop back file transfer on our test system:
TCP/IP Transfer time: 2.5 Seconds
Named Pipes Transfer time: 3.1 Seconds
Now, if you go outside the machine and connect to a remote computer on your network the performance for named pipes is much worse:
TCP/IP Transfer time: 12 Seconds
Named Pipes Transfer time: 2.5 Minutes (Yes Minutes!)
I realize that this is just one system (Windows 7) But I think it is a good indicator of how slow named pipes can be...and it seems like TCP is the way to go.
I know this topic is very old, but it was still relevant for me, and maybe others will look at this in the future as well.
I implemented IPC between Excel (VBA) and another process on the same machine, both via a TCP connection as well as via Named Pipes.
In a quick performance test, I submitted a message than consisted of 26 bytes from client (Excel) to server (not Excel), and waited for the reply message from the other process (which consisted of 12 bytes in the example).
I executed this a ton of times in a loop and measured the average execution time.
With TCP on localhost (Windows 7, no fastpath), one "conversation" (request+reply) took around 300-350 microseconds. Especially sending data was quite slow (sending the 26 bytes took around 200microseconds via TCP).
With Named Pipes, one conversation took around 60 microseconds on average - so a LOT faster.
I'm not entirely sure why the difference was so large. The corporate environment I tested this in has a strict firewall, package inspections and what not, so I THINK this may have been caused as even the localhost-based TCP connection went through security measures significantly slowing it down, while named pipe ones likely did not.
TL:DR: In my case, Named Pipes were around 5-6 times faster than TCP for small packages (have not tested with bigger ones yet)
http://msdn.microsoft.com/en-us/library/aa178138(v=sql.80).aspx
Let me sum it up for you. If you are worried about performance then use TCP/IP. But if you have a really fast network and your not worried about performance then Named Pipes would be "neat" in that it might save you some code.
Not to mention, if you stick to TCP then you will have something that can be scaled, and even load balanced when the time comes.
Cheers,
In the scenario you describe, the local TCP connections are very unlikely to be a bottleneck. It will introduce some overhead, of course, but this should be negligible unless your CPU is already running hot.
At a guess, if your server's CPU usage is normally below 50% or so (with the proxy in place) it isn't worth worrying about minimizing the overhead associated with the local TCP connections.
If CPU usage is regularly above 80% you should probably be doing some profiling. I'd start by comparing the CPU load (or, better still, the performance, if you can measure it meaningfully) when the proxy is in place to when it isn't. Unless the proxy is doing some complicated processing, the overhead associated with the extra TCP connections is probably a significant fraction of the total overhead introduced by the proxy, so that should give you at least an order-of-magnitude estimate of how much you'd gain by using a more efficient form of IPC.
What is the reason to have a proxy on the SAME machine, just curious?
Anyway:
There are several methods for IPC, TCP/IP, named Pipes are comparable in speed and complexity. If you really want something that scales well and has almost no overhead: use shared memory. Best used in combination with a lock free algorithm for advancing the pointers (or use one buffer for each reader (the proxy/the service) and writer(the service/the proxy)).

Resources