Monitoring tools accuracy - Debugging application latency - debugging

we are having latency issues in one of our network application. Most of the time requests are being handled within 100ms. But sometime it can take up to a few seconds for no apparent reason.
So I hooked up some monitoring tools and looked up what was happening (Wireshark to monitor the network externally through port replication and Process Monitor to see what was happening on the local machine).
I was able to match tcp packets and they usually where within a millisecond of eachother in both logs file. But in one occurence, the last packet of a series was delayed by more then 250ms in Process Monitor compared to wireshark (and the application erratic behavior - due to latency - was being observed).
Since Wireshark was hooked up on another computer I'm quite sure that what was being monitored was accurate : all the packed did reach the network card on time.
As for Process Monitor I'm not totally sure about how it work : when is the network data being registered? Is it when it reach the network card? When it is made available to the application? When the application reads the data?
During these 250ms there were a few other events being registered which let me believe that Process Monitor was recording correctly and that this 250ms delay wasn't "created" by it.
Any help regarding the behavior of Process Monitor, the current method I use to dig down the problem or what you think could be the problem would be much appreciated.

Option 2
Perhaps you're experiencing the infamous 250ms delays that the GC cause from time to time (link). You can accurately measure GC suspensions using a specialized CLR host (link)
Option 1 - was ruled out
Since you are using TCP, I'd suggest that you'll turn on the NoDelay option on your socket just to eliminate the possibility that you're suffering from a clash between Nagle's Algorithm and the Delayed ACKs Algorithm. If you're experiencing "batching" of packets while sometimes a packet is "delayed" for about 200ms, then it just might be the issue.
A more in-depth explanation of this behavior can be found here.

Related

How to calculate total network traffic for a period of time for a specific application?

I'm doing performance testing of a native application on Windows and I need to calculate how much more internet traffic new application version produce compared to previous version. Because application is meant to be working in environment with limited internet connection.
Fiddler displays only HTTP and FTP requests and only those that were sent through proxy. In theory application can ignore proxy and use other protocols or sockets.
Resource Monitor seems to contains only average network activity for last minute (Total B/sec). It is not enough for me because network traffic produced by application is not constant.
Network-related performance counters doesn't contain no relevant counters to look at.
TCPView for some reason do not show information for some processes. It display traffic for specific connection rather than application and when connection is closed information is lost.
After more detailed research I found that Sysinternals Process Explorer looks like appropriate tool for internet traffic estimation. You can add Network Send Bytes and Network Recieve Bytes columns to processes table and manually calculate their values difference at the time range boundaries that you are interested in. In order to this to work you need to start Process Explorer as administrator.

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

How to slow my internet connection down so that I can test what my site looks like on a slower connection?

My area recently got 4g internet and it has sped things up to much. Yes, you read right, I want to be able to slow down my browser so that I can watch websites loading. Both for testing my own site so that I can see what other people with slower connections see. Plus I have found that with a lot of sites what I want to see is at the top, so with a slower connection when what I want to see has loaded I can stop downloading the rest of the site and save some of my bandwidth for other things.
Is there a program, or add-on for Firefox that would allow me to do such a thing? If I have to I could limit the connection its self. I am on a window 7 machine with Verizon mobile broadband that plugs into a flash drive.
You can use chrome to simulate internet speed directly.
See this: https://developers.google.com/web/tools/chrome-devtools/network-performance/network-conditions
You can use Fiddler and it's feature Simulate modem speed.
Main menu -> Rules -> Performance -> Simulate Modem Speeds
Here is what I found in:
http://www.charlesproxy.com/documentation/proxying/throttling/
"Charles can be used to adjust the bandwidth and latency of your Internet connection. This enables you to simulate modem conditions using your high-speed connection.
The bandwidth may be throttled to any arbitrary bytes per second. This enables any connection speed to be simulated.
The latency may also be set to any arbitrary number of milliseconds. The latency delay simulates the latency experienced on slower connections, that is the delay between making a request and the request being received at the other end"
There are couple of tools in the market which can throttle your network speed both uplink and downlink. http://bandwidthcontroller.com/trafficShaperXp.html is one such tool. There are couple of others as well. We generally do it via shunra emulator.

Win32Native.Readfile waiting for synchronisation, a performance bottleneck?

I profiled an application. Basically every thread reads an XML file from a network share, deserializes an object, logs to local files, asynchronously logs to db and calls a web service.
Amount of Threads is about 14 on a 24 core machine.
Redgate profiler shows me the multithreaded application is waiting for synchronisation 70% of the time. Is this an alarming signal or to be expected? Further if you can give advice how to approach analysing such a profiler log please share your knowledge.
Waiting for synchronization just means that a thread is suspended while waiting for another thread to complete an operation. Whether or not you should be concerned about this depends on how long you expect the operation on that thread to take to reach completion.
If the stack indicates a read/write, then it may just mean the disk is slow, for example. Maybe you can minimize that by changing your code; maybe it's just a flaky network or disk drive.

How slow are TCP sockets compared to named pipes on Windows for localhost IPC?

I am developing a TCP Proxy to be put in front of a TCP service that should handle between 500 and 1000 active connections from the wild Internet.
The proxy is running on the same machine as the service, and is mostly-transparent. The service is for the most part unaware of the proxy, the only exception being the notification of the real remote IP address of the clients.
This means that, for every inbound open TCP socket, there are two more sockets on the server: the secondth of the pair in the Proxy, and the one on the real service behind the proxy.
The send and recv window sizes on the two Proxy sockets are set to 1024 bytes.
What are the performance implications on this? How slow is this configuration? Should I put some effort on changing the service to use Named Pipes (or other IPC mechanism), or a localhost TCP socket is for the most part an efficient IPC?
The merge of the two apps is not an option. Right now we are stuck with the two process configuration.
EDIT: The reason for having two separate process on the same hardware is 100% economics. We have one server only, and we are not planning on getting more (no money).
The TCP service is a legacy software in Visual Basic 6 which grew beyond our expectations. The proxy is C++. We don't have the time, money nor manpower to rewrite and migrate the VB6 code to a modern programming environment.
The proxy is our attempt to mitigate a specific performance issue on the service, a DDoS attack we are getting from time to time.
The proxy is open source, and here is the project source code.
It will be the same (or at least not measurably different). Winsock is smart enough to know if it's talking to a socket on the same host and, in that case, it will short-circuit pretty much everything below IP and copy data directly buffer-to-buffer. In terms of named pipes vs. sockets, if you need to potentially be able to communicate to different machines ever in the future, choose sockets. If you know for a fact that you'll never need to do that, pick whichever one your developers are most familiar or most comfortable with.
For anyone that comes to read this later, I want to add some findings that answer the original question.
For a utility we are developing we have a networking class that can use named pipes, or TCP with the same calls.
Here is a typical loop back file transfer on our test system:
TCP/IP Transfer time: 2.5 Seconds
Named Pipes Transfer time: 3.1 Seconds
Now, if you go outside the machine and connect to a remote computer on your network the performance for named pipes is much worse:
TCP/IP Transfer time: 12 Seconds
Named Pipes Transfer time: 2.5 Minutes (Yes Minutes!)
I realize that this is just one system (Windows 7) But I think it is a good indicator of how slow named pipes can be...and it seems like TCP is the way to go.
I know this topic is very old, but it was still relevant for me, and maybe others will look at this in the future as well.
I implemented IPC between Excel (VBA) and another process on the same machine, both via a TCP connection as well as via Named Pipes.
In a quick performance test, I submitted a message than consisted of 26 bytes from client (Excel) to server (not Excel), and waited for the reply message from the other process (which consisted of 12 bytes in the example).
I executed this a ton of times in a loop and measured the average execution time.
With TCP on localhost (Windows 7, no fastpath), one "conversation" (request+reply) took around 300-350 microseconds. Especially sending data was quite slow (sending the 26 bytes took around 200microseconds via TCP).
With Named Pipes, one conversation took around 60 microseconds on average - so a LOT faster.
I'm not entirely sure why the difference was so large. The corporate environment I tested this in has a strict firewall, package inspections and what not, so I THINK this may have been caused as even the localhost-based TCP connection went through security measures significantly slowing it down, while named pipe ones likely did not.
TL:DR: In my case, Named Pipes were around 5-6 times faster than TCP for small packages (have not tested with bigger ones yet)
http://msdn.microsoft.com/en-us/library/aa178138(v=sql.80).aspx
Let me sum it up for you. If you are worried about performance then use TCP/IP. But if you have a really fast network and your not worried about performance then Named Pipes would be "neat" in that it might save you some code.
Not to mention, if you stick to TCP then you will have something that can be scaled, and even load balanced when the time comes.
Cheers,
In the scenario you describe, the local TCP connections are very unlikely to be a bottleneck. It will introduce some overhead, of course, but this should be negligible unless your CPU is already running hot.
At a guess, if your server's CPU usage is normally below 50% or so (with the proxy in place) it isn't worth worrying about minimizing the overhead associated with the local TCP connections.
If CPU usage is regularly above 80% you should probably be doing some profiling. I'd start by comparing the CPU load (or, better still, the performance, if you can measure it meaningfully) when the proxy is in place to when it isn't. Unless the proxy is doing some complicated processing, the overhead associated with the extra TCP connections is probably a significant fraction of the total overhead introduced by the proxy, so that should give you at least an order-of-magnitude estimate of how much you'd gain by using a more efficient form of IPC.
What is the reason to have a proxy on the SAME machine, just curious?
Anyway:
There are several methods for IPC, TCP/IP, named Pipes are comparable in speed and complexity. If you really want something that scales well and has almost no overhead: use shared memory. Best used in combination with a lock free algorithm for advancing the pointers (or use one buffer for each reader (the proxy/the service) and writer(the service/the proxy)).

Resources