I am searching for a convenient but more important: reproducible way to measure the performance of a Remote Desktop Session. (Both bandwidth AND latency)
Does anyone have an idea how this can be done? I have thought about measuring the bandwidth to the server, but I am sure if this is a good indicator because it does not include latency and responsiveness.
I am happy about any ideas, hints or resources to read!
There are performance counters for the "Terminal Service Session" on the destion computer you can use to see the maximum output frames and the compression ration. The issue is that you need a controlled visual environment such as a looping full screen video to flood the pipe. This will show you the maximum bandwidth by a single client across your pipe.
Related
Sorry about the length, it's kinda necessary.
Introduction
I'm developing a remote desktop software (just for fun) in C# 4.0 for Windows Vista/7. I've gotten through basic obstacles: I have a robust UDP messaging system, relatively clean program design, I've got a mirror driver (the free DFMirage mirror driver from DemoForge) up and running, and I've implemented NAT traversal for all NAT types except Symmetric NATs (present in corporate firewall situations).
Regarding screen transfer/sharing, thanks to the mirror driver, I'm automatically notified of changed screen regions and I can simply marshal the mirror driver's ever-changing screen bitmap to my own bitmap. Then I compress the screen region as a PNG and send it off from the server to my client. Things are looking pretty good, but it's not fast enough. It's just as slow as VNC (btw, I don't use the VNC protocol, just a custom amateur protocol).
From the slowest remote desktop software to the fastest, the list usually begins at all VNC-like implementations, then climbs up to Microsoft Windows Remote Desktop...and then...TeamViewer. Not quite sure about CrossLoop, LogMeIn - I haven't used them, but TeamViewer is insanely fast. It's quite literally live. I ran a tree command on Command Prompt and it updated with 20 ms delay. I can browse the web just a few milliseconds slower than on my laptop. Scrolling code vertically in Visual Studio has 50 ms lag time. Think about how robust TeamViewer's screen-transfer solution must be to accomplish all this.
VNCs use poll-based hooks for detecting screen change and brute force screen capturing/comparing at their worst. At their best, they use a mirror driver like DFMirage. I'm at this level. And they use something called the RFB protocol.
Microsoft Windows Remote Desktop apparently goes one step higher than VNC. I heard, from somewhere on StackOverflow, that Windows Remote Desktop doesn't send screen bitmaps, but actual drawing commands. That's quite brilliant, because it can just send simple text (draw this rectangle at this coordinate and color it with this gradient)! Remote Desktop really is pretty fast - and it's the standard way of working from home. And it uses something called the RDP protocol.
Now TeamViewer is a complete mystery to me. Apparently, they released their source code for Version 2 (TeamViewer is Version 7 as of February 2012). People have read it and said that Version 2 is useless - that it's just a few improvements over VNC with automatic NAT traversal.
But Version 7...it's ridiculously fast now. I mean, it's actually faster than Windows Remote Desktop. I've streamed DirectX 3D games with TeamViewer (at 1 fps, but Windows Remote Desktop doesn't even allow DirectX to run).
By the way, TeamViewer does all this without a mirror driver. There is an option to install one, and it gets just a bit faster.
The Question
My question is, how is TeamViewer so fast? It must not be possible. If you've got 1920 by 1080 resolution at even 24 bit depth (16 bit depth would be noticeably ugly), thats still 6,220,800 bytes raw. Even using libjpeg-turbo (one of the fastest JPG compression libraries used by large corporations), compressing it down to 30KB (let's be extremely generous), would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers). And that libjpeg-turbo compression would take time to compress. High-quality JPG compression takes 175 milliseconds for a full 1920 by 1080 screenshot for me. And that number goes up if the host's computer runs an Atom processor. I simply don't understand how TeamViewer has optimized their screen transfer so well. Again, small-size images might be highly compressed, but take at least tens of milliseconds to compress. Large-size images take no time to compress, but take a long time to get through. Somehow, TeamViewer completes this entire process to get roughly 20-25 frames per second. I've used a network monitor, and TeamViewer is still lagless at speeds of 500 Kbps and 1 Mbps (VNC software lag for a few seconds at that transfer rate). During my tree Command Prompt test, TeamViewer was receiving inbound data at a rate of 1 Mbps and still running 5-6 fps. VNC and remote desktop don't do that. So, how?
The answers will be somewhat complicated and intricate, so please don't post your $0.02 if you're only going to say it's because they use UDP instead of TCP (would you believe they actually do use TCP just as successfully though).
I'm hoping there's a TeamViewer developer somewhere here on StackOverflow.
Potential Answers
Will update this once people reply.
My thoughts are, first of all, that TeamViewer has very fine network control. For example, they split large packets to just under the MTU size and never waste a trip. They probably have all sorts of fancy hooks to detect screen changes along with extremely fast XOR image comparisons.
The most fundamental thing here probably is that you don't want to transmit static images but only changes to the images, which essentially is analogous to video stream.
My best guess is some very efficient (and heavily specialized and optimized) motion compensation algorithm, because most of the actual change in generic desktop usage is linear movement of elements (scrolling text, moving windows, etc. opposed to transformation of elements).
The DirectX 3D performance of 1 FPS seems to confirm my guess to some extent.
would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers)
You'll find that TeamViewer rarely needs to relay traffic through their own servers. TeamViewer penetrates NAT and networks complicated by NAT using NAT traversal (I think it is UDP hole-punching, like Google's libjingle).
They do use their own servers to middle-man in order to do the handshake and connection set-up, but most of the time the relationship between client and server will be P2P (best case, when the hand-shake is successful). If NAT traversal fails, then TeamViewer will indeed relay traffic through its own servers.
I've only ever seen it do this when a client has been behind double-NAT, though.
It sounds indeed like video streaming more than image streaming, as someone suggested.
JPEG/PNG compression isn't targeted for these types of speeds, so forget them.
Imagine having a recording codec on your system that can realtime record an incoming video stream (your screen). A bit like Fraps perhaps. Then imagine a video playback codec on the other side (the remote client).
As HD recorders can do it (record live and even playback live from the same HD), so should you, in the end. The HD surely can't deliver images quicker than you can read your display, so that isn't the bottleneck. The bottleneck are the video codecs. You'll find the encoder much more of a problem than the decoder, as all decoders are mostly free.
I'm not saying it's simple; I myself have used DirectShow to encode a video file, and it's not realtime by far. But given the right codec I'm convinced it can work.
My random guess is: TV uses x264 codec which has a commercial license (otherwise TeamViewer would have to release their source code). At some point (more than 5 years ago), I recall main developer of x264 wrote an article about improvements he made for low delay encoding (if you delay by a few frames encoders can compress better), plus he mentioned some other improvements that were relevant for TeamViewer-like use. In that post he mentioned playing quake over video stream with no noticeable issues. Back then I was kind of sure who was the sponsor of these improvements, as TeamViewer was pretty much the only option at that time. x264 is an open source implementation of H264 video codec, and it's insanely good implementation, it's the best one. At the same time it's extremely well optimized. Most likely due to extremely good implementation of x264 you get much better results with TV at lower CPU load. AnyDesk and Chrome Remote Desk use libvpx, which isn't as good as x264 (optimization and video quality wise).
However, I don't think TeamView can beat microsoft's RDP. To me it's the best, however it works between windows PCs or from Mac to Windows only. TV works even from mobiles.
Update: article was written in January 2010, so that work was done roughly 10 years ago. Also, I made a mistake: he played call of duty, not quake. When you posted your question, if my guess is correct, TeamViewer had been using that work for 3 years.
Read that blog post from web archive: x264: the best low-latency video streaming platform in the world. When I read the article back in 2010, I was sure that the "startup–which has requested not to be named" that the author mentions was TeamViewer.
Oddly. but in my experience TeamViewer is not faster/more responsive than VNC, only easier to setup. I have a couple of win-boxen that I VNC over OpenVPN into (so there is another overhead layer) and that's on cheap Cable (512 up) and I find properly setup TightVNC to be much more responsive than TeamViewer to same boxen. RDP (naturally) even more so since by large part it sends GUI draw commands instead of bitmap tiles.
Which brings us to:
Why are you not using VNC? There are plethora of open source
solutions, and Tight is probably on top of it's game right now.
Advanced VNC implementations use lossy compression and that seems to achieve
better results than your choice of PNG. Also, IIRC the rest of the payload is also
squashed using zlib. Bothj Tight and UltraVNC have very optimized algos, especially for windows. On top of that Tight is open-source.
If win boxen are your primary target RDP may be a better option, and has an opensource implementation (rdesktop)
If *nix boxen are your primary target NX may be a better option and has an open source implementation (FreeNX, albeit not as optimised as NoMachine's proprietary product).
If compressing JPEG is a performance issue for your algo, I'm pretty sure that image comparison would still take away some performance. I'd bet they use best-case compression for every specific situation ie lossy for large frames, some quick and dirty internall losless for smaller ones, compare bits of images and send only diffs of sort and bunch of other optimisation tricks.
And a lot of those tricks must be present in Tight > 2.0 since again, in my experience it beats the hell out of TeamViewer performance wyse, YMMV.
Also the choice of a JIT compiled runtime over something like C++ might take a slice from your performance edge, especially in memory constrained machines (a lot of performance tuning goes to the toilet when windows start using the pagefile intensively). And you will need memory to keep previous image states for internal comparison atop of what DF mirage gives you.
Is anybody know a good testing tool that can produce a graph containing the CPU cycle and RAM usage?
What I will do for ex. is I will run an application and while the application is running the testing tool will record CPU cycle and RAM Usage and it will make a graph as an output.
Basically what I'm trying to test is how much heavy load an application put on RAM and CPU.
Thanks in advance.
In case this is Windows the easiest way is probably Performance Monitor (perfmon.exe).
You can configure the counters you are interested in (Such as Processor Time/Commited Bytes/et) and create a Data Collector Set that measures these counters at the desired interval. There are even templates for basic System Performance Report or you can add counters for the particular process you are interested in.
You can schedule the time where you want to execute the sampling and you will be able to see the result using PerfMon or export to a file for further processing.
Video tutorial for the basics: http://www.youtube.com/watch?v=591kfPROYbs
Good Sample where it shows how to monitor SQL:
http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
Loadrunner is the best I can think of ; but its very expensive too ! Depending on what you are trying to do, there might be cheaper alternatives.
Any tool which can either hook to the standard Windows or 'NIX system utilities can do this. This has been a defacto feature set on just about every commercial tool for the past 15 years (HP, IBM, Microfocus, etc). Some of the web only commercial tools (but not all) and the hosted services offer this as wekll. For the hosted services you will generally need to punch a hole through your firewall for them to get access to the hosts for monitoring purposes.
On the open source fron this is a totally mixed bag. Some have it, some don't. Some support one platform, but not others (i.e. support Windows, but not 'NIX or vice-versa).
What tools are you using? It is unfortunately common for people to have performance tools in use and not be aware of their existing toolset's monitoring capabilities.
All of the major commercial performance testing tools have this capability, as well as a fair number of the open source ones. The ability to integrate monitor data with response time data is key to the identification of bottlenecks in the system.
If you have a commercial tool and your staff is telling you that it cannot be done then what they are really telling you is that they don't know how to do this with the tool that you have.
It can be done using jmeter, once you install the agent in the target machine you just need to add the perfmon monitor to your test plan.
It will produce 2 result files, the pefmon file and the requests log.
You could also build a plot that compares the resource compsumtion to the load, and througput. The throughput stops increasing when some resource capacity is exceeded. As you can see in the image CPU time increases as the load increases.
JMeter perfmon plugin: http://jmeter-plugins.org/wiki/PerfMon/
I know this is an old thread but I was looking for the same thing today and as I did not found something that was simple to use and produced graphs I made this helper program for apachebench:
https://github.com/juanluisbaptiste/apachebench-graphs
It will run apachebench and plot the results and percentile files using gnuplot.
I hope it helps someone.
I am debugging an application which slows down the system very badly. The application loads a large amount of data (some 1000 files each of half an MB) from the local hard disk.The files are loaded as memory mapped files and are mapped only when needed. This means that at any given point in time the virtual memory usage does not exceed 300 MB.
I also checked the Handle count using handle.exe from sysinternals and found that there are at the most some 8000 odd handles opened. When the data is unloaded it drops to around 400. There are no handle leaks after each load and unload operation.
After 2-3 Load unload cycles, during one load, the system becomes very slow. I checked the virtual memory usage of the application as well as the handle counts at this point and it was well within the limits (VM about 460MB not much fragmentation also, handle counts 3200).
I want how an application could make the system very slow to respond? What other tools can I use to debug this scenario?
Let me be more specific, when i mean system it is entire windows that is slowing down. Task manager itself takes 2 mins to come up and most often requires a hard reboot
The fact that the whole system slows downs is very annoying, it means you can not attach a profiler easily, it also means it would be even difficult to stop the profiling session in order to view the results ( since you said it require a hard reboot ).
The best tool suited for the job in this situation is ETW ( Event Tracing for Windows ), these tools are great, will give you the exact answer you are looking for
Check them out here
http://msdn.microsoft.com/en-us/library/cc305210.aspx
and
http://msdn.microsoft.com/en-us/library/cc305221.aspx
and
http://msdn.microsoft.com/en-us/performance/default.aspx
Hope this works.
Thanks
Tools you can use at this point:
Perfmon
Event Viewer
In my experience, when things happen to a system that prevent Task Manager from popping up, they're usually of the hardware variety -- checking the system event log of Event Viewer is sometimes just full of warnings or errors that some hardware device is timing out.
If Event Viewer doesn't indicate that any kind of loggable hardware error is causing the slowdown, then try Perfmon -- add counters for system objects to track file read, exceptions, context switches etc. per second and see if there's something obvious there.
Frankly the sort of behavior demonstrated is meant to be impossible - by design - for user-mode code to cause. WinNT goes to a lot of effort to insulate applications from each other and prevent rogue applications from making the system unusable. So my suspicion is some kind of hardware fault is to blame. Is there any chance you can simply run the same test on a different PC?
If you don't have profilers, you may have to do the same work by hand...
Have you tried commenting out all read/write operations, just to check whether the slow down disappears ?
"Divide and conquer" strategies will help you find where the problem lies.
If you run it under an IDE, run it until it gets real slow, then hit the "pause" button. You will catch it in the act of doing whatever takes so much time.
You use tools like "IBM Rational Quantify" or "Intel VTune" to detect performance issue.
[EDIT]
Like Benoît did, one good mean is measuring tasks time to identify which is eating cpu.
But remember, as you are working with many files, is likely to be missing that causes the memory to disk swap.
when task manager is taking 2 minutes to come up, are you getting a lot of disk activity? or is it cpu-bound?
I would try process explorer from sysinternals. When your system is in the slowed-down state, and you try running, say, notepad, pay attention to page fault deltas.
Windows is very greedy about caching file data. I would try removing file I/O as someone suggested, and also making sure you close the file mapping as soon as you are done with a file.
I/O is probably causing your slowdown,especially if your files are on the same disk as the OS. Another way to test that would be to move your files to another disk and see if that alleviates the problem.
I am doing some performance measurement of my code on a Windows box and I am finding that I am getting dramatically different results between measurements. A quick bit of ad hoc exploration during a slow one shows in the task manager System Idle Processes taking up almost 100% CPU.
Does anyone know what System Idle Processes actually means and what Windows features it may be running?
NB: I am not measuring performance using the task manager, I just used it to take a look at what else was running during a particularly slow measurement.
Please think before saying this is not programming related and closing the question. I would not ask it unless I thought there were grounds to say it is. In this case I believe it clearly is because it is detrimentally affecting my development and test environments and in order to sort it out I need to know a bit more about it. Programming does not start and end with the writing of the code.
The idle process typically doesn't do any useful work except execute the HLT instruction, which puts that CPU core into a lower power state (C1). However, the fact that your benchmark is not consuming 100% CPU time does open the door for speculation about what is going on.
If your application is single-threaded and your test system is multicore/hyperthreaded/multi-CPU, then you should expect to see around 50% idle CPU time for two cores, 75% for four, etc. This is because the CPU time percentages in Task Manager include all cores. (I believe that older versions of Windows had an option to change this, but I don't see it on Vista.)
If the idle process is consuming a lot of CPU, that may indicate that your application is spending a lot of time sleeping. It might be waiting for data from some external source (e.g. a disk or network). It might be spending a lot of time waiting for synchronization objects (e.g. mutexes or events). It also might be spending a lot of time calling the Sleep() function. Profiling your code should identify where it's spending the time.
Getting fully reproducible benchmark results may require you to disable processor/disk/network-intensive background applications and services (e.g. search indexing, SMS software inventory, virus scanning, Windows Update downloads, IncrediBuild/distcc) or to connect the machine to an isolated network (or to no network at all).
I'm assuming that you wrote a benchmark for your application, and that you're just trying to use Task Manager to diagnose why the benchmark results aren't what you expected. Task Manager isn't an accurate way to measure application performance.
System Idle Process is a sort of a default process that Windows runs on the processor when it has nothing else to schedule for running. This process is like a housekeeper that does things like trying to save system power etc.
If you're measuring the performance of your program, don't use the Windows Task manager. Use Performance Monitor instead (which you can start by typing 'perfmon' at the commandline). Or better still, use a profiler.
If you "system idle process" is taking up 100% then essentially you machine is bored, nothing is going on. If you add up everything going on in task manager, subtract this number from 100%, then you will have the value of "system idle process." Notice it consumes almost no memory at all and cannot be affecting performance.
I'm looking for a very simple tool to monitor the bandwidth of all my applications.
No need for extra features like traffic spying, I'm just interested by bandwidth.
I already know Wireshark (which is great), but what I'm looking for is more something like TcpView (great tool from Sysinternals) with current bandwidth indication.
PS: I'm interested by Windows tools only
Try NetLimiter, which is great for that and also allows you to limit bandwidth usage so that you can test your app in reduced bandwidth scenarios.