Multi-threaded Windows Service - Erlang - windows

I am going to tell the problem that I have to solve and I need some suggestions if i am in the right path.
The problem is:
I need to create a Windows Service application that receive a request and do some action. (Socket communication) This action is to execute a script (maybe in lua or perl).This script models te bussiness rules of the client, querying in Databases, making request in websites and then send a response to the client.
There are 3 mandatory requirements:
The service will receive a lot of request at the same time. So I think to use the worker's thread model.
The service must have a high throughput. I will have many of requests at the same second.
Low Latency: I must response these requests very quickly.
Every request will generate a log entries. I cant write these log entries in the physical disk at same time the scripts execute because the big I/O time. Probably I will make a queue in memory and others threds will consume this queue and write on disk.
In the future, is possible that two woker's thread have to change messages.
I have to make a protocol to this service. I was thinking to use Thrift, but i don't know the overhead involved. Maybe i will make my own protocol.
To write the windows service, i was thinking in Erlang. Is it a good idea?
Does anyone have suggestions/hints to solve this problem? Which is the better language to write this service?

Yes, Erlang is a good choice if you're know it or ready to learn. With Erlang you don't need any worker thread, just implement your server in Erlang style and you'll receive multithreaded solution automatically.
Not sure how to convert Erlang program to Windows service, but probably it's doable.
Writing to the same log file from many threads are suboptimal because requires locking. It's better to have a log-entries queue (lock-free?) and a separate thread (Erlang process?) that writes them to the file. BTW, are you sure that executing external script in another language is much faster than writing a log-record to the file?
It's doubtfully you'll receive much better performance with your own serialization library than Thrift provides for free. Another option is Google Protocol Buffers, somebody claimed that it's faster.
Theoretically (!) it's possible that Erlang solution won't provide you required performance. In this case consider a compilable language, e.g. C++ and asynchronous networking, e.g. Boost.Asio. But be ready that it's much more complicated than Erlang way.

Related

Performance Testing in Mirth Connect Using JMeter

Mirth Connect is a software that is designed to handle a message flow and it has built-in support to handle HL7 messages in particular and therefore this software is widely used for interfacing in Healthcare applications. Over the years I have seen the Mirth software experiencing performance issues primarily due to the message build up over time and in scenarios where it receives a heavy message load in quick succession.
Mirth has a channel-based architecture and it's ideal if there is some way we can performance test the Mirth channel and get JMeter statistics for its performance. Whereby we can gather the necessary information to optimize the channel transformers and also to set the purge routines accordingly.
However in the Internet there was little to no information on this area, that is how one can use JMeter to test a Mirth channel. A team in Sri Lanka did some research on this area back in 2013 and I found their findings and achievements below
http://pragmatictestlabs.com/2016/10/09/performance-testing-healthcare-application-hl7-jmeter/
However this is very specific the output here was a JSon object which they extracted, in Mirth however we can have outputs in various forms and there need to be a better way to do this. An important takeaway from this is the input that is the input is general we can use JMeter to generate HL7 messages and pass them to Mirth that's great but how to capture the response generally, it would be ideal if there is a way to read the Mirth Dashboard through JMeter, all the output statistics are there it's just a matter of reading them.
I have an application where Mirth reads HL7 messages both ADT and RDE and creates a text file accordingly with appropriate content and drops it to a shared location. Then the application reads the files and shows the information to the user.
I wish to do two performance tests here
Measure how much time the complete system takes and how it varies with load from the arrival of a message to its information being available to the user
Measure how much time the channel takes and how it does it as the load increases
I can do the first one because I can generate HL7 messages using JMeter and I can get JMeter to read the output in the application or the database. The problem is with the second, can I do this in a general way.
You asked for suggestions, so I'm going to share my general strategy for performance testing Mirth channels. I suspect that this won't be a complete answer to your question, and I might not be telling you anything you don't already know, but I'm hoping this will help you find an answer that you are comfortable with.
For several reasons, try not to spend too much time "testing the complete system":
Firstly, testing the entire system necessarily includes testing low-level configuration like the number of CPU cores, the NICs being used in the box, and kernel level software like the TCP/IP stack. You don't usually have any control over these things, so you can't optimize them in any way.
Secondly, the performance of the entire system is going to be heavily dependant on whatever ancillary code is running on the box. If a sysadmin decides to 'nice' my Mirth process down, or to use that box to also host a SQL server, that will have an impact on the system that I (again) have no control over.
Thirdly and most frankly, I find that the "performance of an entire system" is something that management asks about during system setup so they can get a cost estimate; but they know that they're only getting an estimate. You do your best to use test metrics to give a good guess for the initial hardware provisioning, but everyone knows that it's really the production performance metrics that will drive later provisioning costs.
Make sure that you build your channels for testability. I find that it's much easier to test a channel when the source and destination can be changed to "Channel Reader" and "Channel Writer" without changing message handling. One way to look at this is that you're not going to overhaul Mirth's MLLP stack or Java's TCP stack, so just eliminate these things from your testing.
I keep a source of useful test messages. I have a couple of files on a network drive that have around a hundred messages that test for nasty edge cases that I've run into over the years on my HL7 interfaces. I wrote a small Mirth channel that reads these in from a file and spews out copies as fast as it can. By turning on "Queueing" on the destination side of that channel, I can queue up a bajillion test messages that are ready to send to the channel I want to test. In the past I took the time to build a test interface that acted like a fake EMR to spew out randomly constructed messages, but there didn't seem to be any advantage over just spewing copies of the same messages from my test files.
Finally, and most importantly, it's critical that you measure the performance of your test instance using the same metrics that you'll use to measure the performance of your production instance. If the sole production metric you care about is 'messages per second', then that's what you need to measure on your test box. If memory footprint is a concern in production, then you need to measure memory usage in your test environment as well. When you make a change to to your test instance that decreases an important metric by 10%, you'll need to make sure your management is aware before you push that change to production.
Note that getting some of these metrics can be tricky, since Mirth doesn't include good tools to monitor its own performance. The Mirth dashboard is a good place to keep an eye on errors or crashes, but it's not a great place to find performance data. During my testing I make sure that I use whatever resource monitoring tool that the sysadmins will be using to monitor the performance of the production instance. Beyond that, I use a manual process to test performance: If I want to count message per second, I send through a batch of messages and look at the timestamps of the first and last messages. If I want to get an idea of the CPU load of a Mirth channel, I use the Windows Performance Monitor or the posix 'top' command.

ZeroMQ and actor model

I'm having problems scaling up an application that uses the actor model and zeromq. To put it simply: I'm trying to create thousands of threads that communicate via sockets. Similar to what one would do with a Erlang-type message passing. I'm not doing it for multicore/performance reasons, but because framing it in this way gives me very clean code.
From a philosophical point of view it sounds as if this is what zmq developers would like to achieve, e.g.
http://zeromq.org/whitepapers:multithreading-magic
However, it seems as if there are some practical limitations. At 1024 inproc sockets I start getting the "ZMQError: Too many open files" error. TCP gives me the typical "Assertion failed: fds.size () <= FD_SETSIZE" crash.
Why does inproc sockets have this limit?
To get it to work I've had to group together items to share a socket. Is there a better way?
Is zmq just the wrong tool for this kind of job? i.e. it's still more a network library than an actor message passing library?
ZMQ uses file descriptors as the "resource unit" for inproc connections. There is a limit for file descriptors set by the OS, you should be able to modify that (found several potential avenues for Windows with a quick Google search), though I don't know what the performance impact might be.
It looks like this is related to the ZMQ library using C code that is portable among systems for opening new files, rather than Windows native code that doesn't suffer from this same limitation.

Using gevent and multiprocessing together to communicate with a subprocess

Question:
Can I use the multiprocessing module together with gevent on Windows in an efficient way?
Scenario:
I have a gevent based Python application doing asynchronous I/O on Windows. The application is mostly I/O bound, but there are spikes of higher CPU load as well. This application would need to control a console application via its stdin and stdout. I cannot modify this console application and the user will be able to use his own custom one, only the text (line) based communication protocol is fixed.
I have a working implementation using subprocess and threads, but I would rather move the whole subprocess based communication code together with those threads into a separate process to turn the main application back to single-threaded. I plan to use the multiprocessing module for this.
Prior reading:
I have been searching the Web a lot and read some source code, so I know that the multiprocessing module is using a Pipe implementation based on named pipes on Windows. A pair of multiprocessing.queue.Queue objects would be used to communicate with the second Python process. These queues are based on that Pipe implementation, e.g. the IPC would be done via named pipes.
The key question is, whether calling the incoming Queue's get method would block gevent's main loop or not. There's a timeout for that method, so I could make it into a loop with a small timeout, but that's not a good solution, since it would still block gevent for small time periods hurting its low I/O latency.
I'm also open to suggestions on how to circumvent the whole problem of using pipes on Windows, which is known to be hard and sometimes fragile. I'm not sure whether shared memory based IPC is possible on Windows or not. Maybe I could wrap the console application in a way which would allow communicating with the child process using network sockets, which is known to work well with gevent.
Please don't question my primary use case, if possible. Thanks.
The Queue's get method is really blocking. Using it with timeout could potentially solve your problem, but it definitely won't be a cleanest solution and, which is the most important, will introduce extra latency for no good reason. Even if it wasn't blocking, that won't be a good solution either. Just because non-blocking itself is not enough, the good asynchronous call/API should smoothly integrate into the I/O framework in use. Be that gevent for Python, libevent for C or Boost ASIO for C++.
The easiest solution would be to use simple I/O by spawning your console applications and attaching to its console in and out descriptors. There are at two major factors to consider:
It will be extremely easy for your clients to write client applications. They will not have to work with any kind of IPC, socket or other code, which could be very hard thing for many. With this approach, application will just read from stdin and write to stdout.
It will be extremely easy to test console applications using this approach as you can manually start them, enter text into console and see results.
Gevent is a perfect fit for async read/write here.
However, the downside is that you will have to start this application, there will be no support for concurrent communication with it, and there will be no support for communication over network. There is even a good example for starters.
To keep it simple but more flexible, you can use TCP/IP sockets. If both client and server are running on the same machine. Also, a good operating system will use IPC as an underlying implementation, so it will be fast. And, if you are worrying about performance of this case, you probably should not use Python at all and look at other technologies.
Even fancies solution – use ZeroC ICE. It is very modern technology allowing almost seamless inter-process communication. It is a CORBA killer, very easy to use. It is heavily used by many, proven to be fastest in its class and rock stable. The beauty of this solution is that you can seamlessly integrate programs in many different languages, like Python, Java, C++ etc. But this will require some of your time to get familiar with a concept. If you decide to go this way, just spend a day reading trough documentation.
Hope it helps. Good luck!
Your question is already quite old. Nevertheless, I would like to recommend http://gehrcke.de/gipc which -- I believe -- would tackle the outlined challenge in a very straight-forward fashion. Basically, it allows you to integrate multiprocessing-based child processes anywhere in your application (also on Windows). Interaction with Process objects (such as calling join()) is gevent-cooperative. Via its pipe management, it allows for cooperatively blocking inter-process communication. However, on Windows, IPC currently is much less efficient than on POSIX-compliant systems (since non-blocking I/O is imitated through a thread pool). Depending on the IPC messaging volume of your application, this might or might not be of significance.

Creating proxy between application queries and Internet

Is it possible (for example with C++, but it does not really matter) to create a bridge/proxy application to get the data requested by another application? To be more detailed, I'm talking about a Adobe Air based game. (I want to create a report with stats based on the data acquired, but that is not actually part of this question.)
Rather than simple "boolean" answer please provide some link to example/documentation. Thanks
It would always be possible, and depending on the your target operating system, may require a fair amount of effort, which begs the question - is there a reason you cannot use Fiddler or some packet sniffing software for your target OS?
You can write a proxy by hand, in python can be quite easy. All you have to do is to set localhost as proxy, then forward the request and pass it back to the calling socket.
I've started writing something like this some times ago. The idea was to write a simple replacement for dansguardian.
I've uploaded it on github so you can give it a look if it can help.
I do not remember well (I've started writing it the last year) but maybe with some modification can fit well your requests.
Conceptually, this is your configuration:
app_client -> [app_channel] -> proxy -> [server_channel] -> app_server
Your proxy starts a server socket, the app_client connects to it. This is our app_channel. Now your proxy creates a connection to the app_server. This is your server_channel.
Now start 2 threads, one which reads from the app_channel and writes to the server_channel, the other reads from the server_channel and writes to the app_channel.
This will create a transparent connection to the app_server via your proxy. You can extract the data as you wish. If the data is encrypted though, there's very little you can actually do by way of analysis.

Looking for pattern/approach/suggestions for handling long-running operation tied to web app

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.
Here is an example flow. The object/widget doesn't really matter.
Customer comes to the site and specifies object/widget they are looking for.
We search/clean/filter for widgets matching some initial criteria. <-- long running process
Customer further configures more detail about the widget they are looking for.
When the long running process is complete the customer is able to complete the last few steps before conversion.
Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.
The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.
A few questions:
Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?
Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?
Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?
Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.
Thanks!
Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.
Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?
As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:
Make sure that any parameters passed through are escaped correctly
Ensure that more than one copy of the process does not run at once
If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.
So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.
Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)
This is the poor man's solution:
exec ("/usr/bin/php long_running_process.php > /dev/null &");
Alternatively you could:
Insert a row into your database with details of the background request, which a daemon can then read and process.
Write a message to a message queue which a daemon then read and processed.
Here's some discussion on the Java version of this problem.
See java: what are the best techniques for communicating with a batch server
Two important things you might do:
Switch to Java and use JMS.
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.
Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.
Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

Resources