How to improve RPC data throughput over high latency network - windows

I am working on client-server software using Microsoft RPC (over TCP) as the communication method. We sometimes transfer files from the client to the server. This works fine in local networks. Unfortunately, when we have a high latency, even a very wide bandwidth does not give a decent transfer speed.
Based on a WireShark log, the RPC layer sends a bunch of fragments, then waits for an ACK from the server before sending more and this causes the latency to dominate the transfer time. I am looking for a way to tell RPC to send more packets before pausing.
The issue seems to be essentially the same as with a too small TCP window, but there might be an RPC specific fragment window at work here, since Wireshark does not show the TCP-level window being full. iPerf connection tests with a small window do give those warnings, and a speed similar to the RPC transfer. With larger windows sizes, the iPerf transfer is three times faster than the RPC, even with a reasonable (40ms) latency.
I did find some mentions of an RPC fragment window at microsoft's site (https://msdn.microsoft.com/en-us/library/gg604601.aspx) and in an RPC document (http://pubs.opengroup.org/onlinepubs/9629399/chap12.htm search for window_size), but these seem to concern only connectionless (UDP) RPC. Additionally, they mention an RPC "fack" message and I observed only regular TCP level ACK:s in the log.
My conclusion is that either the RPC layer is using a stupidly low TCP window, or it is limiting the number of fragment packages it sends at a time by some internal logic. Either way, I need to make it send more between ACKs. Is there some way to do this?
I could of course just transfer the file over multiple simultaneous connections, but that seems more like a work-around than a solution.
PS. I know RPC is not really designed for file transfer, but this is a legacy application and the RPC pipe deals with authentication and whatnot, so keeping the file transfer there would be best, at least for now.
PPS. I guess that if the answer to this question is a configuration option, this would be better suited for SuperUser, but an API setting would be ideal, which is why I posted this here.

I finally found a way to control this. This Microsoft documentation page: Configuring Computers for RPC over HTTP contains registry settings that set the windows RPC uses, at least when used in conjunction with RPC over HTTP.
The two most relevant settings were:
HKLM\Software\Microsoft\Rpc\ClientReceiveWindow: DWORD
Making this higher (some MB:s, in bytes) on the client machine made the download to the client much faster.
HKLM\Software\Microsoft\Rpc\InProxyReceiveWindow: DWORD
Making this higher on the server machine made the upload faster.
The downside of these options is that they are global. The first one will affect all RPC clients on the client machine and the latter will affect all RPC over HTTP proxying on the server. This may have serious caveats, but a tenfold speed increase is nothing to be scoffed at, either.
Still, setting these on a per-connection basis would be much better.

Related

Is there any significant performance benefits with HTTP2 multiplexing as compared with HTTP2 Server Push?

HTTP2 multiplexing uses the same TCP connection thereby removing Connection time to the same host.
But with HTTP2 Server Push is there any significant performance benefits except for the roundtrip time that HTTP2 multiplexing will take while requesting every resource.
I gave a presentation about this, that you can find here.
In particular, the demo (starting at 36:37) shows the benefits that you can have with multiplexing alone, and then by adding HTTP/2 Push.
Spoiler: the combination of HTTP/2 multiplexing and Push yields astonishing better results with respect to HTTP/1.1.
Then again, every case is different, so you have to actually measure your case.
But the potential of HTTP/2 to yield better performance than HTTP/1.1 is really large, and many (most?) cases will benefit from this.
I'm not sure what exactly you're asking here, or if it's a good fit for StackOverflow but will attempt to answer none-the-less. If this is not the answer you are looking for then please rephrase the question so we can understand what exactly it is you are looking for.
You are right in that HTTP/2 uses multiplexing, which does negate the need for multiple connections (and the time and resources needed to set them up and manage them). However it's much more than that as it's not limited (browsers will typically limit connections to 4-6 per host) and also allows for "similar" connections (same IP and same certificate but different hostname) to share connections as well. Basically it solves the queuing of resources that the request/response method of HTTP/1 means and reduces need of limited multiple connections that HTTP/1 requires as a workaround. Which also reduces need for other workarounds like sharding, sprite files, concatenation... etc.
And yes HTTP/2 server push saves on one round trip. So when you request a webpage it sends both the HTML and the CSS needed to draw the page as the server knows you will need the CSS as it's pointless just sending you the HTML, waiting for your web browser to get it, parse it, see it needs CSS and request the CSS file and wait for it to download.
I'm not sure if you're implying that a round trip time is so low, that there is little gains in HTTP/2 server push because there is now no delay in requesting a file due to HTTP/2 multiplexing? If so that is not the case - there are significant gains to be made in pushing resources, particularly blocking resources like CSS which the browser will wait for before drawing a single thing on screen. While multiplexing reduces the delay in sending a request, it does not reduce the latency on the request travelling to the server, now on the server responding to that and sending it back. While these sound small they are noticeable and make a website feel slow.
So yes, at present, the primary gain for HTTP/2 Server Push is in reducing that round trip time (basically to zero for key resources).
However we are at the infancy of this and there are potential other uses for performance or other reasons. For example you could use this as a way of prioritising content so an important image could be pushed early when, without this, a browser would likely request CSS and Javascript first and leave images until later. Server Push could also negate the need for inline CSS (which bloats pages with copies of style sheets and may require Javascript to then load the proper CSS file) - another HTTP/1.1 workaround for performance. I think it will be very interesting to watch what happens with HTTP/2 Server Push over the coming years.
Saying that, there still some significant challenges with HTTP/2 server push. Most importantly how do you prevent wasting bandwidth by pushing resources that the browser already has cached? It's likely a digest HTTP header will be added for this but still under discussion. Which leads on how to implement HTTP/2 Server Push in the best method - for web browsers, web servers and web developers? The HTTP/2 spec is a bit vague on how this should be implemented, which leaves it up to different web servers in particular providing different methods to signal to the server to push a resource.
As I say, I think this one of the parts of HTTP/2 that could lead to some very interesting applications. We live in interesting times...

Simulate slow speed for TCP sockets in Windows

I'm building an application that uses TCP sockets to communicate. I want to test how it behaves under slow-speed conditions.
There are similar question on the site, but as I understand it, they deal with HTTP traffic, or are about Linux. My traffic is not HTTP, just ordinary TCP sockets, and the OS is Windows.
I tried using fiddler's setting for Modem Speed but it didn't work, it seems to work only for HTTP connections.
While it is true that you probably want to invest in an extensive set of unit tests, You can simulate various network conditions using VMWare Workstation:
You will have to install a virtual machine for testing, setup bridged networking (for the vm to access your real network) and upload your code to the vm.
After that you can start changing the settings and see how your application performs.
NetLimiter can also be used, but it has fewer options (in your case, packet loss is very interesting to test and is not available in netlimiter).
There is an excellent utility for Windows that can do throttling and much more:
https://jagt.github.io/clumsy/
I think you're taking the wrong approach here.
You can achieve everything that you need with some well designed unit tests. All of the things that a slow network link causes can be simulated in a unit test environment in controlled conditions.
Things that your code MUST handle to deal with "slow" links are just things that you should be dealing with anyway, including:
The correct handling of fragmented messages. All of your network reading code needs to correctly assume that each read will return between 1 byte and the size of your read buffer. You should never assume that you'll get complete 'messages' as TCP knows nothing of your concept of messages.
TCP flow control causing either your synchronous sends to fail with some form of 'try later' error or your async sends to succeed and potentially use an uncontrolled amount of resources (see here for more details). Note that this can happen even on 'fast' links if you are sending faster than the receiver is consuming.
Timeouts - again this isn't limited to "slow" links. All of your timeout handling code should be robust and tested. You may want to make sure that any read timeout is based on any read completing rather than reading a complete message in x time. You may be getting your data at a slow rate but whilst you're still getting data the link is alive.
Connection failure - again not something specific to "slow" links. You need to know how you deal with connections being reset at any time.
In summary nothing you can achieve by running your client and server on a simulated slow network cannot be achieved with a decent set of unit tests and everything that you would want to test on such a link is something that could affect any of your connections on any speed of link.

How to efficiently handle thousands of keep alive connections in Go?

Using golang's net/http server to handle connections, is there a pattern to better handle 10,000 keep alive connections with relatively low requests per second each?
my benchmark performance with something like Wrk is 50,000 requests per second, and with real traffic (from realtime bidding exchanges) I have a hard time beating 8,000 requests per second.
I know connection multiplexing from a hardware loadbalancer is possible, but it seems like the same type of pattern can be achieved in Go.
You can distribute load on local and remote servers using an IPC protocol like JSON RPC through e.g. UNIX and TCP sockets.
Related: Go Inter-Process Communication
As to the performance bottleneck; it has been discussed extensively on the go-nuts mailing list. At the time of writing it is the runtime's goroutine scheduler and world-stopping garbage collector.
The core team has recently made major improvements to the runtime to alleviate this problem yet there still is room for improvement. To quote one example:
Due to tighter coupling of the run-time and network libraries, fewer context switches are required on network operations.

How slow are TCP sockets compared to named pipes on Windows for localhost IPC?

I am developing a TCP Proxy to be put in front of a TCP service that should handle between 500 and 1000 active connections from the wild Internet.
The proxy is running on the same machine as the service, and is mostly-transparent. The service is for the most part unaware of the proxy, the only exception being the notification of the real remote IP address of the clients.
This means that, for every inbound open TCP socket, there are two more sockets on the server: the secondth of the pair in the Proxy, and the one on the real service behind the proxy.
The send and recv window sizes on the two Proxy sockets are set to 1024 bytes.
What are the performance implications on this? How slow is this configuration? Should I put some effort on changing the service to use Named Pipes (or other IPC mechanism), or a localhost TCP socket is for the most part an efficient IPC?
The merge of the two apps is not an option. Right now we are stuck with the two process configuration.
EDIT: The reason for having two separate process on the same hardware is 100% economics. We have one server only, and we are not planning on getting more (no money).
The TCP service is a legacy software in Visual Basic 6 which grew beyond our expectations. The proxy is C++. We don't have the time, money nor manpower to rewrite and migrate the VB6 code to a modern programming environment.
The proxy is our attempt to mitigate a specific performance issue on the service, a DDoS attack we are getting from time to time.
The proxy is open source, and here is the project source code.
It will be the same (or at least not measurably different). Winsock is smart enough to know if it's talking to a socket on the same host and, in that case, it will short-circuit pretty much everything below IP and copy data directly buffer-to-buffer. In terms of named pipes vs. sockets, if you need to potentially be able to communicate to different machines ever in the future, choose sockets. If you know for a fact that you'll never need to do that, pick whichever one your developers are most familiar or most comfortable with.
For anyone that comes to read this later, I want to add some findings that answer the original question.
For a utility we are developing we have a networking class that can use named pipes, or TCP with the same calls.
Here is a typical loop back file transfer on our test system:
TCP/IP Transfer time: 2.5 Seconds
Named Pipes Transfer time: 3.1 Seconds
Now, if you go outside the machine and connect to a remote computer on your network the performance for named pipes is much worse:
TCP/IP Transfer time: 12 Seconds
Named Pipes Transfer time: 2.5 Minutes (Yes Minutes!)
I realize that this is just one system (Windows 7) But I think it is a good indicator of how slow named pipes can be...and it seems like TCP is the way to go.
I know this topic is very old, but it was still relevant for me, and maybe others will look at this in the future as well.
I implemented IPC between Excel (VBA) and another process on the same machine, both via a TCP connection as well as via Named Pipes.
In a quick performance test, I submitted a message than consisted of 26 bytes from client (Excel) to server (not Excel), and waited for the reply message from the other process (which consisted of 12 bytes in the example).
I executed this a ton of times in a loop and measured the average execution time.
With TCP on localhost (Windows 7, no fastpath), one "conversation" (request+reply) took around 300-350 microseconds. Especially sending data was quite slow (sending the 26 bytes took around 200microseconds via TCP).
With Named Pipes, one conversation took around 60 microseconds on average - so a LOT faster.
I'm not entirely sure why the difference was so large. The corporate environment I tested this in has a strict firewall, package inspections and what not, so I THINK this may have been caused as even the localhost-based TCP connection went through security measures significantly slowing it down, while named pipe ones likely did not.
TL:DR: In my case, Named Pipes were around 5-6 times faster than TCP for small packages (have not tested with bigger ones yet)
http://msdn.microsoft.com/en-us/library/aa178138(v=sql.80).aspx
Let me sum it up for you. If you are worried about performance then use TCP/IP. But if you have a really fast network and your not worried about performance then Named Pipes would be "neat" in that it might save you some code.
Not to mention, if you stick to TCP then you will have something that can be scaled, and even load balanced when the time comes.
Cheers,
In the scenario you describe, the local TCP connections are very unlikely to be a bottleneck. It will introduce some overhead, of course, but this should be negligible unless your CPU is already running hot.
At a guess, if your server's CPU usage is normally below 50% or so (with the proxy in place) it isn't worth worrying about minimizing the overhead associated with the local TCP connections.
If CPU usage is regularly above 80% you should probably be doing some profiling. I'd start by comparing the CPU load (or, better still, the performance, if you can measure it meaningfully) when the proxy is in place to when it isn't. Unless the proxy is doing some complicated processing, the overhead associated with the extra TCP connections is probably a significant fraction of the total overhead introduced by the proxy, so that should give you at least an order-of-magnitude estimate of how much you'd gain by using a more efficient form of IPC.
What is the reason to have a proxy on the SAME machine, just curious?
Anyway:
There are several methods for IPC, TCP/IP, named Pipes are comparable in speed and complexity. If you really want something that scales well and has almost no overhead: use shared memory. Best used in combination with a lock free algorithm for advancing the pointers (or use one buffer for each reader (the proxy/the service) and writer(the service/the proxy)).

High-Performance In-Browser Networking

(Similar in spirit to but different in practice from this question.)
Is there any cross-browser-compatible, in-browser technology that allows a high-performance perstistent network connection between a server application and a client written in, say, Javascript? Think XMLHttpRequest on caffeine. I am working on a visualisation system that's restricted to at most a few users at once, and the server is pretty robust, so it can handle as much as it needs to. I would like to allow the client to have access to video streamed from the server at a minimum of about 20 frames per second, regardless of what their graphics hardware capabilities are.
Simply put: is this doable without resorting to Flash or Java?
I'm not sure what you mean by XMLHttpRequest on caffeine...the performance of a remote polling object like that are subject to the performance of the client and the server, not of the language constructs themselves. Granted, there is HTTP overhead in AJAX, but the only viable alternative is to use HTTP long polling (which basically keeps the server connection open longer and passes chunks of data down bit by bit in the background. It's literally the same as AJAX, except the connection stays open until something happens (thus moving the HTTP overhead to idle time).
If I recall correctly, Opera had some kind of sockets implementation a while back, but nobody uses Opera.

Resources