Direct3D9 cannot access the GPU when minimizing or locking the screen, even without a display monitor hardware. Is here a resolution to let Direct3D9 work even no display monitor is connected with machine. For example, create a software display monitor or hook some logics? I don't know.
Thanks for your help.
I try to convert Direct3D9 to Direct3D9Ex to let it work. But some formats of memory pool are not supported in Direct3D9Ex.
D3D9 normally cannot be run on a "headless" system. That feature is exclusive to modern D3D/DXGI. Unless your application must run on Windows XP or older OSes, it's probably easier to migrate to D3D11 or D3D12 at this point. XP's market share is vanishingly small these days, especially for users wanting to run D3D content (i.e. games).
See this MS Support Article and Microsoft Docs.
We are facing a weird problem in Surface device with Windows RT operating system
When we are playing the video from Cloudfront CDN through JW player, Video takes long time to load and buffers very often than other devices. Sometimes it stops playing video. Same problem when we are using HTML5 video player
When we tried to play the video in Surface pro 2, it works fine.
What might me the problem here, it is because of CPU, GPU, RAM or any other browser issue in that specific device?
Simple Jwplayer Example here : http://jsfiddle.net/hiteshbhilai2010/Ga55z/1/
Simple HTML5 Player Example here : http://jsfiddle.net/hiteshbhilai2010/dU6TF/
The GPU of the device is only used for decoding the video. If you can watch a video locally with no problems then you can almost certainly rule out the GPU as a bottleneck.
It doesn't seem very likely that the CPU is slowing you down but you should be able to check how much of the CPUs time the video player is taking through Task Manager or something similar.
The only 2 remaining factors I can think of that would affect you are RAM and your network hardware (since you've already tried watching it on another device on the same connection I'm assuming).
I'd get some info on what network speeds your device + connection is capable of and if you can rule that out, investigate how much RAM the browser is using.
I have made a custom video player in Flash built on the AS3 Netstream. In development it was never causing any significant CPU usage: Youtube/Vimeo are at about 10 to 15% CPU and my own player 20 to 25%.
Now it's running on our development webserver and it is hogging the CPU.
I have tried setting the framerate unreasonably low (1fps) and it doesn't seem to make any significant impact.
We have experimented with WMODE in the HTML page that runs the player. In wmode: "direct" it is a little better, but still nowhere close to the CPU amount in FlashDevelop.
I will gladly post all the code you think is relevant but at the moment I am at a loss for what could be causing this.
UPDATE:
Could it be related to the video file format?
UPDATE:
I have tried Chrome and Firefox on multiple computers. CPU usage varies according to the speed of the computer, as expected, but is always about 4 or 5 times as much as any other video player. So far we have found out that the high CPU compared to other players is caused by decompressing. If a smaller video format is used it works better. However, this doesn't answer the main question: why is the CPU usage within browser(s) so much higher than in standalone Flash?
There could be a difference in performance in different environments, so please check the follwing things:
is flashdevelop using a debug or release player?
is your browser using a debug or release player?
does it matter if you make a release or debug build (if you use the Flash IDE, this setting is called 'permit debugging')? Test on debug player AND release player?
are you using the chrome pepper player (buildin)?
is your code valid, doublechecked, no runtime errors?
did you profile the flash on memory leaks?
are you using StageVideo? This will render video on GPU, which should give better performance (Btw youtube and vimeo does)
did you test with other videos, bitrates, encodings?
I disabled the plugin-container in Firefox (in about:config, turn dom.ipc.plugins.enabled to false) and my Flex app seems to run as fast as in the standalone player now.
Sorry about the length, it's kinda necessary.
Introduction
I'm developing a remote desktop software (just for fun) in C# 4.0 for Windows Vista/7. I've gotten through basic obstacles: I have a robust UDP messaging system, relatively clean program design, I've got a mirror driver (the free DFMirage mirror driver from DemoForge) up and running, and I've implemented NAT traversal for all NAT types except Symmetric NATs (present in corporate firewall situations).
Regarding screen transfer/sharing, thanks to the mirror driver, I'm automatically notified of changed screen regions and I can simply marshal the mirror driver's ever-changing screen bitmap to my own bitmap. Then I compress the screen region as a PNG and send it off from the server to my client. Things are looking pretty good, but it's not fast enough. It's just as slow as VNC (btw, I don't use the VNC protocol, just a custom amateur protocol).
From the slowest remote desktop software to the fastest, the list usually begins at all VNC-like implementations, then climbs up to Microsoft Windows Remote Desktop...and then...TeamViewer. Not quite sure about CrossLoop, LogMeIn - I haven't used them, but TeamViewer is insanely fast. It's quite literally live. I ran a tree command on Command Prompt and it updated with 20 ms delay. I can browse the web just a few milliseconds slower than on my laptop. Scrolling code vertically in Visual Studio has 50 ms lag time. Think about how robust TeamViewer's screen-transfer solution must be to accomplish all this.
VNCs use poll-based hooks for detecting screen change and brute force screen capturing/comparing at their worst. At their best, they use a mirror driver like DFMirage. I'm at this level. And they use something called the RFB protocol.
Microsoft Windows Remote Desktop apparently goes one step higher than VNC. I heard, from somewhere on StackOverflow, that Windows Remote Desktop doesn't send screen bitmaps, but actual drawing commands. That's quite brilliant, because it can just send simple text (draw this rectangle at this coordinate and color it with this gradient)! Remote Desktop really is pretty fast - and it's the standard way of working from home. And it uses something called the RDP protocol.
Now TeamViewer is a complete mystery to me. Apparently, they released their source code for Version 2 (TeamViewer is Version 7 as of February 2012). People have read it and said that Version 2 is useless - that it's just a few improvements over VNC with automatic NAT traversal.
But Version 7...it's ridiculously fast now. I mean, it's actually faster than Windows Remote Desktop. I've streamed DirectX 3D games with TeamViewer (at 1 fps, but Windows Remote Desktop doesn't even allow DirectX to run).
By the way, TeamViewer does all this without a mirror driver. There is an option to install one, and it gets just a bit faster.
The Question
My question is, how is TeamViewer so fast? It must not be possible. If you've got 1920 by 1080 resolution at even 24 bit depth (16 bit depth would be noticeably ugly), thats still 6,220,800 bytes raw. Even using libjpeg-turbo (one of the fastest JPG compression libraries used by large corporations), compressing it down to 30KB (let's be extremely generous), would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers). And that libjpeg-turbo compression would take time to compress. High-quality JPG compression takes 175 milliseconds for a full 1920 by 1080 screenshot for me. And that number goes up if the host's computer runs an Atom processor. I simply don't understand how TeamViewer has optimized their screen transfer so well. Again, small-size images might be highly compressed, but take at least tens of milliseconds to compress. Large-size images take no time to compress, but take a long time to get through. Somehow, TeamViewer completes this entire process to get roughly 20-25 frames per second. I've used a network monitor, and TeamViewer is still lagless at speeds of 500 Kbps and 1 Mbps (VNC software lag for a few seconds at that transfer rate). During my tree Command Prompt test, TeamViewer was receiving inbound data at a rate of 1 Mbps and still running 5-6 fps. VNC and remote desktop don't do that. So, how?
The answers will be somewhat complicated and intricate, so please don't post your $0.02 if you're only going to say it's because they use UDP instead of TCP (would you believe they actually do use TCP just as successfully though).
I'm hoping there's a TeamViewer developer somewhere here on StackOverflow.
Potential Answers
Will update this once people reply.
My thoughts are, first of all, that TeamViewer has very fine network control. For example, they split large packets to just under the MTU size and never waste a trip. They probably have all sorts of fancy hooks to detect screen changes along with extremely fast XOR image comparisons.
The most fundamental thing here probably is that you don't want to transmit static images but only changes to the images, which essentially is analogous to video stream.
My best guess is some very efficient (and heavily specialized and optimized) motion compensation algorithm, because most of the actual change in generic desktop usage is linear movement of elements (scrolling text, moving windows, etc. opposed to transformation of elements).
The DirectX 3D performance of 1 FPS seems to confirm my guess to some extent.
would take time to route through TeamViewer's servers (TeamViewer bypasses corporate Symmetric NATs by simply proxying traffic through their servers)
You'll find that TeamViewer rarely needs to relay traffic through their own servers. TeamViewer penetrates NAT and networks complicated by NAT using NAT traversal (I think it is UDP hole-punching, like Google's libjingle).
They do use their own servers to middle-man in order to do the handshake and connection set-up, but most of the time the relationship between client and server will be P2P (best case, when the hand-shake is successful). If NAT traversal fails, then TeamViewer will indeed relay traffic through its own servers.
I've only ever seen it do this when a client has been behind double-NAT, though.
It sounds indeed like video streaming more than image streaming, as someone suggested.
JPEG/PNG compression isn't targeted for these types of speeds, so forget them.
Imagine having a recording codec on your system that can realtime record an incoming video stream (your screen). A bit like Fraps perhaps. Then imagine a video playback codec on the other side (the remote client).
As HD recorders can do it (record live and even playback live from the same HD), so should you, in the end. The HD surely can't deliver images quicker than you can read your display, so that isn't the bottleneck. The bottleneck are the video codecs. You'll find the encoder much more of a problem than the decoder, as all decoders are mostly free.
I'm not saying it's simple; I myself have used DirectShow to encode a video file, and it's not realtime by far. But given the right codec I'm convinced it can work.
My random guess is: TV uses x264 codec which has a commercial license (otherwise TeamViewer would have to release their source code). At some point (more than 5 years ago), I recall main developer of x264 wrote an article about improvements he made for low delay encoding (if you delay by a few frames encoders can compress better), plus he mentioned some other improvements that were relevant for TeamViewer-like use. In that post he mentioned playing quake over video stream with no noticeable issues. Back then I was kind of sure who was the sponsor of these improvements, as TeamViewer was pretty much the only option at that time. x264 is an open source implementation of H264 video codec, and it's insanely good implementation, it's the best one. At the same time it's extremely well optimized. Most likely due to extremely good implementation of x264 you get much better results with TV at lower CPU load. AnyDesk and Chrome Remote Desk use libvpx, which isn't as good as x264 (optimization and video quality wise).
However, I don't think TeamView can beat microsoft's RDP. To me it's the best, however it works between windows PCs or from Mac to Windows only. TV works even from mobiles.
Update: article was written in January 2010, so that work was done roughly 10 years ago. Also, I made a mistake: he played call of duty, not quake. When you posted your question, if my guess is correct, TeamViewer had been using that work for 3 years.
Read that blog post from web archive: x264: the best low-latency video streaming platform in the world. When I read the article back in 2010, I was sure that the "startup–which has requested not to be named" that the author mentions was TeamViewer.
Oddly. but in my experience TeamViewer is not faster/more responsive than VNC, only easier to setup. I have a couple of win-boxen that I VNC over OpenVPN into (so there is another overhead layer) and that's on cheap Cable (512 up) and I find properly setup TightVNC to be much more responsive than TeamViewer to same boxen. RDP (naturally) even more so since by large part it sends GUI draw commands instead of bitmap tiles.
Which brings us to:
Why are you not using VNC? There are plethora of open source
solutions, and Tight is probably on top of it's game right now.
Advanced VNC implementations use lossy compression and that seems to achieve
better results than your choice of PNG. Also, IIRC the rest of the payload is also
squashed using zlib. Bothj Tight and UltraVNC have very optimized algos, especially for windows. On top of that Tight is open-source.
If win boxen are your primary target RDP may be a better option, and has an opensource implementation (rdesktop)
If *nix boxen are your primary target NX may be a better option and has an open source implementation (FreeNX, albeit not as optimised as NoMachine's proprietary product).
If compressing JPEG is a performance issue for your algo, I'm pretty sure that image comparison would still take away some performance. I'd bet they use best-case compression for every specific situation ie lossy for large frames, some quick and dirty internall losless for smaller ones, compare bits of images and send only diffs of sort and bunch of other optimisation tricks.
And a lot of those tricks must be present in Tight > 2.0 since again, in my experience it beats the hell out of TeamViewer performance wyse, YMMV.
Also the choice of a JIT compiled runtime over something like C++ might take a slice from your performance edge, especially in memory constrained machines (a lot of performance tuning goes to the toilet when windows start using the pagefile intensively). And you will need memory to keep previous image states for internal comparison atop of what DF mirage gives you.
When writing DirectX applications, obviously it's desirable to support the user suspending the application via Alt-Tab in a way that's fast and error-free. What is the best set of practices for ensuring this? Things that need to be addressed include:
The best methods of detecting when your application has been alt-tabbed out of and when it has been returned to.
What DirectX resources are lost when the user alt-tabs, and the best ways to cope with this.
Major things to do and things to avoid in application architecture for purposes of alt-tab support.
Any significant differences between major DirectX versions as they apply to the above.
Interesting tricks and gotchas are also good to hear about.
I will assume you are using C++ for the purposes of my answers, but if you can afford to use C#, XNA (http://creators.xna.com/) is an excellent game platform that handles all of these issues for you.
1]
This article is helpful for windows events in the window procedure to detect when a window loses or gains focus, you could handle this on your main window: http://www.functionx.com/win32/Lesson05.htm. Also, check out the WM_ACTIVATEAPP message here: http://msdn.microsoft.com/en-us/library/ms632614(VS.85).aspx
2]
The graphics device is lost when the application loses focus from full screen mode. Microsoft offers an article on how to handle this: http://msdn.microsoft.com/en-us/library/bb174717(VS.85).aspx This article also has a lost device tutorial: http://www.codesampler.com/dx9src/dx9src_6.htm
DirectInput can also have a device lost error state, here is a link about that: http://www.toymaker.info/Games/html/directinput.html
DirectSound can also have a device lost error state, this article has code that handles that: http://www.eastcoastgames.com/directx/chapter2.html
3]
I would make sure to never disable Alt-Tab. You probably want minimal CPU load while the application is not active because the user probably Alt-Tabbed because they want to do something else, so you could completely pause the application, or reduce the frames rendered per second. If the application is minimzed, you of course don't need to render anything either. After thinking about a network game, my best solution is that you should still reduce the frames rendered per second as well as the amount of network packets handled, possibly even throwing away many of the packets that come in until the game is re-activated.
4]
Honestly I would just stick to DirectX 9.0c (or DirectX 10 if you want to limit your target operating system to Vista and newer) if at all possible :)
Finally, the DirectX sdk has numerous tutorials and samples: http://www.microsoft.com/downloads/details.aspx?FamilyID=24a541d6-0486-4453-8641-1eee9e21b282&displaylang=en
We solved it by not using a fullscreen DirectX device at all - instead we used a full-screen window with the top-most flag to make it hide the task bar. If you Alt-Tab out of that, you can remove the flag and minimize the window. The texture resources are kept alive by the window.
However, this approach doesn't handle the device lost event happening due to 'lock screen', Ctrl+Alt+Delete, remote desktop connections, user switching or similar. But those don't need to be handled extremely fast or efficiently (at least that was the case in our application)
All serious D3D apps should be able to handle lost devices as this is something that can happen for a variety of reasons.
In DX10 under Vista there is a new "Timeout Detection and Recovery" feature that makes it common in my experience for graphics devices to be reset which would cause a lost device for your app. This seems to be improving as drivers mature but you need to handle it anyway.
In DX8 and 9 (and 10?) if you create your resources (vertex and index buffers and textures mainly) using D3DPOOL_MANAGED they will persist across lost devices and will not need reloading. This is because they are stored in system memory and the DX runtime copies to video memory automatically. However there is a performance cost due to the copying and this is not recommended for rapidly changing vertex data. Of course you would profile first to determine if there is a speed issue :-)