Testing file transfer speed across LAN/WAN

Testing file transfer speed across LAN/WAN - windows

Is there a utility for Windows that allows you to test different aspects of file transfer operations across a Lan or a Wan.
Example...
How long does it take to move a file of a known size (500 MB or 1 GB) from Server A (on site) to Server B (on site) or to Server C (off site-Satellite location)?

D-ITG will allow you to test many aspects of your links. It does not necessarily allow you transfer a file directly, but it allows you to control almost all aspects of the transmission of data across the wire.
If all you are interested in is bulk transfer time (and not all the nitty-gritty details) you could just use a basic FTP application and time the transfer.

Probably nothing you've not already figured out. You could get some coarse grain metrics using a batch file to coordinate:
start monitoring
copy file
stop monitoring
Copy file might just be initiating a file copy between two nodes on the LAN, or it might initiate a FTP copy between two nodes on the WAN.
Monitoring could be as basic as writing the current time to output or file, or it could be as complex as adding performance counter metrics from the network adapter on the two machines.
A commercial WAN emulator would also give you the information your looking for. I've used the Shunra Appliance successfully in the past. Its pretty expensive, so I'd really only recommend it if critical business success is riding on understanding how application behavior could change based on network conditions and is something you could incorporate into regular testing activities.

Related

Performance Testing in Mirth Connect Using JMeter

Mirth Connect is a software that is designed to handle a message flow and it has built-in support to handle HL7 messages in particular and therefore this software is widely used for interfacing in Healthcare applications. Over the years I have seen the Mirth software experiencing performance issues primarily due to the message build up over time and in scenarios where it receives a heavy message load in quick succession.
Mirth has a channel-based architecture and it's ideal if there is some way we can performance test the Mirth channel and get JMeter statistics for its performance. Whereby we can gather the necessary information to optimize the channel transformers and also to set the purge routines accordingly.
However in the Internet there was little to no information on this area, that is how one can use JMeter to test a Mirth channel. A team in Sri Lanka did some research on this area back in 2013 and I found their findings and achievements below
http://pragmatictestlabs.com/2016/10/09/performance-testing-healthcare-application-hl7-jmeter/
However this is very specific the output here was a JSon object which they extracted, in Mirth however we can have outputs in various forms and there need to be a better way to do this. An important takeaway from this is the input that is the input is general we can use JMeter to generate HL7 messages and pass them to Mirth that's great but how to capture the response generally, it would be ideal if there is a way to read the Mirth Dashboard through JMeter, all the output statistics are there it's just a matter of reading them.
I have an application where Mirth reads HL7 messages both ADT and RDE and creates a text file accordingly with appropriate content and drops it to a shared location. Then the application reads the files and shows the information to the user.
I wish to do two performance tests here
Measure how much time the complete system takes and how it varies with load from the arrival of a message to its information being available to the user
Measure how much time the channel takes and how it does it as the load increases
I can do the first one because I can generate HL7 messages using JMeter and I can get JMeter to read the output in the application or the database. The problem is with the second, can I do this in a general way.

You asked for suggestions, so I'm going to share my general strategy for performance testing Mirth channels. I suspect that this won't be a complete answer to your question, and I might not be telling you anything you don't already know, but I'm hoping this will help you find an answer that you are comfortable with.
For several reasons, try not to spend too much time "testing the complete system":
Firstly, testing the entire system necessarily includes testing low-level configuration like the number of CPU cores, the NICs being used in the box, and kernel level software like the TCP/IP stack. You don't usually have any control over these things, so you can't optimize them in any way.
Secondly, the performance of the entire system is going to be heavily dependant on whatever ancillary code is running on the box. If a sysadmin decides to 'nice' my Mirth process down, or to use that box to also host a SQL server, that will have an impact on the system that I (again) have no control over.
Thirdly and most frankly, I find that the "performance of an entire system" is something that management asks about during system setup so they can get a cost estimate; but they know that they're only getting an estimate. You do your best to use test metrics to give a good guess for the initial hardware provisioning, but everyone knows that it's really the production performance metrics that will drive later provisioning costs.
Make sure that you build your channels for testability. I find that it's much easier to test a channel when the source and destination can be changed to "Channel Reader" and "Channel Writer" without changing message handling. One way to look at this is that you're not going to overhaul Mirth's MLLP stack or Java's TCP stack, so just eliminate these things from your testing.
I keep a source of useful test messages. I have a couple of files on a network drive that have around a hundred messages that test for nasty edge cases that I've run into over the years on my HL7 interfaces. I wrote a small Mirth channel that reads these in from a file and spews out copies as fast as it can. By turning on "Queueing" on the destination side of that channel, I can queue up a bajillion test messages that are ready to send to the channel I want to test. In the past I took the time to build a test interface that acted like a fake EMR to spew out randomly constructed messages, but there didn't seem to be any advantage over just spewing copies of the same messages from my test files.
Finally, and most importantly, it's critical that you measure the performance of your test instance using the same metrics that you'll use to measure the performance of your production instance. If the sole production metric you care about is 'messages per second', then that's what you need to measure on your test box. If memory footprint is a concern in production, then you need to measure memory usage in your test environment as well. When you make a change to to your test instance that decreases an important metric by 10%, you'll need to make sure your management is aware before you push that change to production.
Note that getting some of these metrics can be tricky, since Mirth doesn't include good tools to monitor its own performance. The Mirth dashboard is a good place to keep an eye on errors or crashes, but it's not a great place to find performance data. During my testing I make sure that I use whatever resource monitoring tool that the sysadmins will be using to monitor the performance of the production instance. Beyond that, I use a manual process to test performance: If I want to count message per second, I send through a batch of messages and look at the timestamps of the first and last messages. If I want to get an idea of the CPU load of a Mirth channel, I use the Windows Performance Monitor or the posix 'top' command.

How to calculate total network traffic for a period of time for a specific application?

I'm doing performance testing of a native application on Windows and I need to calculate how much more internet traffic new application version produce compared to previous version. Because application is meant to be working in environment with limited internet connection.
Fiddler displays only HTTP and FTP requests and only those that were sent through proxy. In theory application can ignore proxy and use other protocols or sockets.
Resource Monitor seems to contains only average network activity for last minute (Total B/sec). It is not enough for me because network traffic produced by application is not constant.
Network-related performance counters doesn't contain no relevant counters to look at.
TCPView for some reason do not show information for some processes. It display traffic for specific connection rather than application and when connection is closed information is lost.

After more detailed research I found that Sysinternals Process Explorer looks like appropriate tool for internet traffic estimation. You can add Network Send Bytes and Network Recieve Bytes columns to processes table and manually calculate their values difference at the time range boundaries that you are interested in. In order to this to work you need to start Process Explorer as administrator.

algorithm for synchronizing text between client/server

What is a low-latency, low-bandwidth algorithm for synchronizing, say, a text file between a client and a server?
Is there a design where the client send a delta of it's current state and it's last ACK'd state from the server? I am thinking Quake3 networking..
EDIT 1:
More specifically, how would a diff/delta algorithm behave in a client/server environment.
e.g. Is it more expensive to calculate a diff on the client side, send to server, server interprets and updates its store, sends ACK to client? Or is it cheaper to have a replication model where client sends its full state and server stores it..?
EDIT 2:
100 KB text file. Something small, not too large.

You mean like a diff?
Store the server-side's version of the file in the client. Whenever you need to synchronize, run a diff (you can either write your own or use a library). Then send the difference over to the server and have the server patch it's local version.

If a client also edits text, and has an undo/redo feature then undo stack can be used for delta. For large texts and small changes using undo stack should be more efficient than running a diff.

For text you can use delta algorithm, take a look, for example, on how rsync works.
Google uses a different approach to update chrome, you can "google" it to see.
Edit: If it was a server generating one change and replicating in lots of clients, it should be done in server. From the question's changes, I understood that a client (or many clients) will produce the changes and want them to be replicated on server.
Well... I'd take in account 4 things:
network performance
number of clients
number of changes expected
performance of the server and of the client
Too many clients sending and doing that on server: it's almost a DoS
I'd only do that on server if there were few clients, high server performance and low client performance.
Otherwise, I'd only do that on clients.

What's the best way to monitor a large number of Ruby processes?

I have a farm of several physical servers each running a large number of Ruby "workers" (daemon-like processes) and I'd like to be able to monitor the health and progress of these processes from a central location, perhaps with historical graphing like Cacti provides. What's the simplest preferably-open-standard protocol for doing something like that? Please note I'm already using monit to keep the processes up and running and under control; what I'm asking for here is a single point of entry (i.e. dashboard) for checking in on them. Thanks.

If you are already using Monit then M/Monit sounds like a perfect match.
"M/Monit expand upon Monit's capabilities to provide monitoring and management of all Monit enabled hosts from one simple to use web-interface. " - http://mmonit.com/

G'day,
What about having a monitoring process on each server that checks the status of each process and then writes that out to a flat text file, say once every five minutes.
Then another process located on a central server can retrieve at those flat files and trawl through the results and flag any issues.
If you save the individual files and timestamp them, you would also be able to see any trends forming.
Just a quick ideea.
BTW The above system is used to monitor the servers in one of the largest websites in the world. Our scripts are written in Perl with a little bit of shell script but I don't see why you couldn't write your monitoring scripts in Ruby as well.
HTH
cheers,

I'd suggest to take a look at Zabbix.
It's not as simple as monit, of course, but it allows you to run data collecting agent on each of your servers, with all agents feeding the central reporting and storage server with their data. Those agents can use any custom scripts to get the metrics - you can write simple scripts to extract the data you need from your workers, send it back to the central reporting server and display it there on the dashboard.

Best approach to collecting log files from remote machines?

I have over 500 machines distributed across a WAN covering three continents. Periodically, I need to collect text files which are on the local hard disk on each blade. Each server is running Windows server 2003 and the files are mounted on a share which can be accessed remotely as \server\Logs. Each machine holds many files which can be several Mb each and the size can be reduced by zipping.
Thus far I have tried using Powershell scripts and a simple Java application to do the copying. Both approaches take several days to collect the 500Gb or so of files. Is there a better solution which would be faster and more efficient?

I guess it depends what you do with them ... if you are going to parse them for metrics data into a database, it would be faster to have that parsing utility installed on each of those machines to parse and load into your central database at the same time.
Even if all you are doing is compressing and copying to a central location, set up those commands in a .cmd file and schedule it to run on each of the servers automatically. Then you will have distributed the work amongst all those servers, rather than forcing your one local system to do all the work. :-)

The first improvement that comes to mind is to not ship entire log files, but only the records from after the last shipment. This of course is assuming that the files are being accumulated over time and are not entirely new each time.
You could implement this in various ways: if the files have date/time stamps you can rely on, running them through a filter that removes the older records from consideration and dumps the remainder would be sufficient. If there is no such discriminator available, I would keep track of the last byte/line sent and advance to that location prior to shipping.
Either way, the goal is to only ship new content. In our own system logs are shipped via a service that replicates the logs as they are written. That required a small service that handled the log files to be written, but reduced latency in capturing logs and cut bandwidth use immensely.

Each server should probably:
manage its own log files (start new logs before uploading and delete sent logs after uploading)
name the files (or prepend metadata) so the server knows which client sent them and what period they cover
compress log files before shipping (compress + FTP + uncompress is often faster than FTP alone)
push log files to a central location (FTP is faster than SMB, the windows FTP command can be automated with "-s:scriptfile")
notify you when it cannot push its log for any reason
do all the above on a staggered schedule (to avoid overloading the central server)
Perhaps use the server's last IP octet multiplied by a constant to offset in minutes from midnight?
The central server should probably:
accept log files sent and queue them for processing
gracefully handle receiving the same log file twice (should it ignore or reprocess?)
uncompress and process the log files as necessary
delete/archive processed log files according to your retention policy
notify you when a server has not pushed its logs lately

We have a similar product on a smaller scale here. Our solution is to have the machines generating the log files push them to a NAT on a daily basis in a randomly staggered pattern. This solved a lot of the problems of a more pull-based method, including bunched-up read-write times that kept a server busy for days.

It doesn't sound like the storage servers bandwidth would be saturated, so you could pull from several clients at different locations in parallel. The main question is, what is the bottleneck that slows the whole process down?

I would do the following:
Write a program to run on each server, which will do the following:
Monitor the logs on the server
Compress them at a particular defined schedule
Pass information to the analysis server.
Write another program which sits on the core srver which does the following:
Pulls compressed files when the network/cpu is not too busy.
(This can be multi-threaded.)
This uses the information passed to it from the end computers to determine which log to get next.
Uncompress and upload to your database continuously.
This should give you a solution which provides up to date information, with a minimum of downtime.
The downside will be relatively consistent network/computer use, but tbh that is often a good thing.
It will also allow easy management of the system, to detect any problems or issues which need resolving.

NetBIOS copies are not as fast as, say, FTP. The problem is that you don't want an FTP server on each server. If you can't process the log files locally on each server, another solution is to have all the server upload the log files via FTP to a central location, which you can process from. For instance:
Set up an FTP server as a central collection point. Schedule tasks on each server to zip up the log files and FTP the archives to your central FTP server. You can write a program which automates the scheduling of the tasks remotely using a tool like schtasks.exe:
KB 814596: How to use schtasks.exe to Schedule Tasks in Windows Server 2003
You'll likely want to stagger the uploads back to the FTP server.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio