How OBS record hidden window? - winapi

I'm making a program to record a window that is obscured by another window via python and WIN32API library.
Through many searches, I succeeded in capturing the hidden window through hwnd and BitBlt, but the execution time of my code is not stable.
I tried to provide the recording function by selecting 30~60 fps, but the time required to capture the hidden window and write() it to the video object of cv2 is irregular, so I can't make a 60fps video.
So I thought of OBS and Discord. In the case of OBS, it is possible to enforce stable recording for obscured windows. For Discord, there is a feature that allows you to select a specific window and share it with multiple people in real time (this can also be done for hidden windows).
I'd like to know how these programs provide stable video for occluded windows. I'm a student, and I'm not elite. I am asking this question because it is difficult to analyze the vast Github source code of OBS. Can someone give me an explanation of how the above program captures the screen?

Last time I checked, OBS was doing it with low-level hacks instead of APIs.
Specifically, they have wrote a DLL which they inject into the target application using CreateRemoteThread WinAPI. Then, they patch application’s code to intercept calls to IDXGISwapChain.Present method. Once a call is intercepted, the injected code has access to D3D frame buffer texture. It can copy that texture into another texture on GPU, and then do something with the copy. One possibility is DXGI surface sharing to pass the copy from the target application into the capturing process. The APIs for that don’t require both sides of the sharing to be in the same process, textures can be shared across processes just fine.
Unfortunately for you, their approach is borderline impossible to re-implement in higher-level languages like Python. Such things are only doable in C++ or similar low-level languages, and relatively hard to implement and debug.

#dy.kim, don't be afraid of large codebases. window-capture.c and the OBS GUI fairly obviously list "bitblt" and "Windows Graphics Capture" as the two methods it uses to capture windows, with the preference going to WGC if neither is specified.

Related

How can I programmatically interact with a video game GUI

Before I get shot down on this one, I realize that the 'how' answer for this question might be slightly debatable, however I'm more interested in the 'what'.
In a nut shell I want to know which methods I can use to interact with a PC video game interface. I want to create a program that can extract data from a video game market interface.
My first initial thought was that I would need to programmatically take screen shots and then use some Optical Character Recognition software to extract the text. Then run whatever operation on the extracted text to derive my incites.
Then I was thinking it might just be easier to have a bunch of mini screen shots that I just use to find matches on certain sections of the screen. When a match is found, I would then know what the text is on the screen, without having to actually 'extract' it.
For those out there whom have done this, can you point me in one direction or the other? Perhaps there is a method that I am completely unaware of.
If its the case that this question is not suitable for this forum. It would be much appreciated if you could direct me elsewhere.
Edit: I should probably add that I'm not looking to spend a fortune on this project... so any free software would be the best. Perhaps that's a tall order.
I'm starting to think Sikuli is the direction I'm going to go. Open Source image recognition software, integrates with Python, Ruby, Java, JDBC, JavaScript and more.
-- Expanding on the question --
There are basically 3 categories of tools:
Recorder while you manually work along your workflow, a recorder tracks your mouse and keyboard actions. After stopping the recording, you might playback (autorun your worflow). The recordings can usually be edited and augmented with additional features.
GUI aware the tool allows to programmatically operate on GUI elements like buttons. This is based on the knowledge of internal structures and names of the GUI elements and their features. Some of these tools also have a recording feature.
Visually the tool “sees” images (usually retangular pixel areas) on the screen and allows to act on these images using mouse and keyboard simulation. There might be some recorder feture as well with such a tool.
SikuliX belongs to the 3rd category and currently does not have a recorder feature.
Answer in progress...
In games with moddable UIs, like many MMOs, you could create a mod that streams data through a series of black and white squares that could be read with optical sensors. From there, a microcontroller could deliver the data back to the PC via USB or wifi.
My approach as a noob. First determine if OCR 100% needed, I think this plays a role in speed.
if possible:
-run game in window (allows for trouble shooting and easy troubleshooting)
-is there a high contrast option for game? Will help Sikuli find things
then you plan out your scenarios:
You have to create different functions for different situations. A lot of gaming is "do you see this?" Then "do this" until that is gone.
Start with small parts you want to automate then build on them. Making sure your parts can scale in case small change need to happen, they will. For instance you want to open the menu if you see an object, lets say a tree.
Assume you have some sort of walking algorithm.
setROI(region1) #focus here for tree
if exists(tRee):
click(loCation) #you could hit the shortcut key to opening the menu
click(iTem) #if the item moves in the menu then you may need to scroll to find it first or you can change the ROI and start seeing if sikuli can differentiate your item from one you dont want to click.
You would get that to loop into other actions and proceed. Goodluck.

On-screen display driver

I need write linux kernel module that will display message box over all other windows on the screen. And I need drawing image in the kernel, access to this picture from user-space application is not required. I don't understand how to do this. What framework should I use - framebuffers or v4l? I suppose direct programming of the display controller is not a good idea, because there is other driver in the kernel that already do this. So, the questions are: how to interact between in-kernel drivers, and how to specify that my picture should be on top?
I would be grateful for any help.
What you want can not be done in this way, since the kernel does not handle GUIs and it especially does not handle window systems. It provides access to video output devices in one form or another, but all the actual drawing and compositing of a screen is done in user space.
Now, a kernel module would have the power to just overwrite the frame buffer, but, as you noticed, there are multiple interfaces for different purposes. Additionally, using 3D rendering even for 2D desktops is quite common. Hijacking a 3D command stream for your purposes would be disproportionately difficult.
Even if you managed all that, there is no guarantee that the user space window system wouldn't just overwrite your message box immediately. Maybe even before it reaches the display.
So no, it can't be done in any practical way directly from the kernel. Your best alternative would be a user space daemon displaying the message on your kernel code's behalf through the standard channels like any other GUI program.

Prevent program from using ClipCursor?

An old game (Pod) is kept alive with a glide wrapper and thus can now be run in custom resolutions larger than the native game resolution which was 640x480.
However, due to problems with the glide wrappers, if the game is run at 1920x1080 for example, the cursor is only allowed to move in a 0, 0, 640, 480 rectangle; obviously the WinAPI ClipCursor function has been used by the original developers for this.
This is pretty nasty because you can't act with the game menu mouse-wise in a useful way since not all buttons can be reached.
Is it possible to disable ClipCursor() functionality globally? Do I have to inject a DLL (I didn't do this completely before) or would it just be enough to let a C# app run in the background, watching for the game process and setting ClipCursor() to the real screen area after the process has been started?
I seriously doubt it's calling ClipCursor() more than once. Try writing a small program to call ClipCursor() and set it back to the size of your desktop. Run that program after your game is started.
edit
Depending on your skill level, you could also try using ollydbg to step through the program and find where it's calling the ClipCursor() API, and insert a jump to skip over it.

WP7: IsolatedStorage vs. WriteableBitmap

I have a scenario that I need some good solid advice on. The question is really about speed of WriteableBitmap vs. images in IsolatedStorage on the Windows Phone.
I have an app that displays a UserControl (#1) which is a little graphically heavy. When the user swipes it, it transitions in a push-left type of transition to bring in a new UserControl (#2) which is also a little graphically heavy. If the user swipes the other way, control #1 is brought in in the same type of push-transition, this time from the right.
What I do today is take a snapshot of #1, load #2 off screen and take a snapshot of it, put both side-by-side in a Canvas control and animate that control either left or right. One of the reasons I don't just use the controls and animate them is they may have animation that starts when they are loaded - my current technique allows me to capture a screen shot of pre-animation and post-animation, depending on which direction they go in.
What I'm wondering, however, if it would be better/faster to just do the above the first time and send the writeablebitmap to IsolatedStorage with Extenstions.SaveJPEG and just use that instead in subsequent tranistion animations.
Would load/render/WriteableBitmap each time generally be faster or load jpeg from IsolatedStorage be faster each time? I see that the Transitions control in the SDK doesn't really do either of these, so I'm open to suggestions that are different that also might improve performance.
I expect this to be very depended on the hardware and application. So it is pretty hard to give an answer based on this input. It doesn't look to hard to test (on actual hardware and with the actual application) so my advice is to build both and test.
The applications I have been working with use both approaches and to be honest I haven't noticed much difference.
Also you might try and enable bitmap caching on the controls. This will give you a writeable bitmap implementation that is very fast.

what language/libraries an app that has a video preview window?

I want to make a simple assistant for putting together AviSynth scripts. This would be a windows desktop application that would have a "preview" screen of an avi movie, which would give you a timeline, play, fast-forward, rewind, advance and go back frame-by-frame. The program would need to know the frame number of the current frame in the player and its filename.
What language is best suited for this? I know PHP ( I understand that this is not a contender ) and am familiar with Java. My thought is that the biggest hurdle with this project will be finding a library for the video playing features. With a cursory glance, no Java video libraries jumped out at me. My next thought would be c++ for this.
The output of this program would be an AviSynth script, a plaintext file which looks like this:
AviSource("myAvi.avi")
Crop(0, 0, 320, 240)
Blur(0.1)
There are a few tool kits that can do tihs:
C#: DirectShow (DirectX)
Java: JMF
If you have Avisynth installed, the only thing you need for preview (If I understood, that's your need) is something that can decode uncompressed video. It would open like a normal file. I'm sure there are video players implemented fairly well in Java, but I don't know how much functionallity from them you need. Anyway parsing scripts is not easy - I recommend you not to try to if you don't need to.
EDIT: I'm sorry, I thought you needed a very specific app, but from what you seem to need, you don't need to code anything, use AVSP!
Please watch this video, it shows how straightforward it is. It has advanced functions such as auto-completion, (even from your own auto-loading scripts!) syntax coloring, macros, automtic importing, drag&drop (of a video, for instance - just drag it to the video and AVSP makes the loading) scrit preview with zoom and all stuff, you can use automatic or custom sliders (you can make a slider that re-writes a number on the script in real time, for instance for hue/luminosity/contrast/etc. that would be cumbersome to control via script), checkboxes & radio buttons (for boolean values, etc...), text fields that alter strings in real time, and basically anything you need... Please check it out.
Also, VirtualDubMod is OLD.
And yep, AVSP is free, both gratis and libre! =)

Resources