I need write linux kernel module that will display message box over all other windows on the screen. And I need drawing image in the kernel, access to this picture from user-space application is not required. I don't understand how to do this. What framework should I use - framebuffers or v4l? I suppose direct programming of the display controller is not a good idea, because there is other driver in the kernel that already do this. So, the questions are: how to interact between in-kernel drivers, and how to specify that my picture should be on top?
I would be grateful for any help.
What you want can not be done in this way, since the kernel does not handle GUIs and it especially does not handle window systems. It provides access to video output devices in one form or another, but all the actual drawing and compositing of a screen is done in user space.
Now, a kernel module would have the power to just overwrite the frame buffer, but, as you noticed, there are multiple interfaces for different purposes. Additionally, using 3D rendering even for 2D desktops is quite common. Hijacking a 3D command stream for your purposes would be disproportionately difficult.
Even if you managed all that, there is no guarantee that the user space window system wouldn't just overwrite your message box immediately. Maybe even before it reaches the display.
So no, it can't be done in any practical way directly from the kernel. Your best alternative would be a user space daemon displaying the message on your kernel code's behalf through the standard channels like any other GUI program.
Related
I'm making a program to record a window that is obscured by another window via python and WIN32API library.
Through many searches, I succeeded in capturing the hidden window through hwnd and BitBlt, but the execution time of my code is not stable.
I tried to provide the recording function by selecting 30~60 fps, but the time required to capture the hidden window and write() it to the video object of cv2 is irregular, so I can't make a 60fps video.
So I thought of OBS and Discord. In the case of OBS, it is possible to enforce stable recording for obscured windows. For Discord, there is a feature that allows you to select a specific window and share it with multiple people in real time (this can also be done for hidden windows).
I'd like to know how these programs provide stable video for occluded windows. I'm a student, and I'm not elite. I am asking this question because it is difficult to analyze the vast Github source code of OBS. Can someone give me an explanation of how the above program captures the screen?
Last time I checked, OBS was doing it with low-level hacks instead of APIs.
Specifically, they have wrote a DLL which they inject into the target application using CreateRemoteThread WinAPI. Then, they patch application’s code to intercept calls to IDXGISwapChain.Present method. Once a call is intercepted, the injected code has access to D3D frame buffer texture. It can copy that texture into another texture on GPU, and then do something with the copy. One possibility is DXGI surface sharing to pass the copy from the target application into the capturing process. The APIs for that don’t require both sides of the sharing to be in the same process, textures can be shared across processes just fine.
Unfortunately for you, their approach is borderline impossible to re-implement in higher-level languages like Python. Such things are only doable in C++ or similar low-level languages, and relatively hard to implement and debug.
#dy.kim, don't be afraid of large codebases. window-capture.c and the OBS GUI fairly obviously list "bitblt" and "Windows Graphics Capture" as the two methods it uses to capture windows, with the preference going to WGC if neither is specified.
Before I get shot down on this one, I realize that the 'how' answer for this question might be slightly debatable, however I'm more interested in the 'what'.
In a nut shell I want to know which methods I can use to interact with a PC video game interface. I want to create a program that can extract data from a video game market interface.
My first initial thought was that I would need to programmatically take screen shots and then use some Optical Character Recognition software to extract the text. Then run whatever operation on the extracted text to derive my incites.
Then I was thinking it might just be easier to have a bunch of mini screen shots that I just use to find matches on certain sections of the screen. When a match is found, I would then know what the text is on the screen, without having to actually 'extract' it.
For those out there whom have done this, can you point me in one direction or the other? Perhaps there is a method that I am completely unaware of.
If its the case that this question is not suitable for this forum. It would be much appreciated if you could direct me elsewhere.
Edit: I should probably add that I'm not looking to spend a fortune on this project... so any free software would be the best. Perhaps that's a tall order.
I'm starting to think Sikuli is the direction I'm going to go. Open Source image recognition software, integrates with Python, Ruby, Java, JDBC, JavaScript and more.
-- Expanding on the question --
There are basically 3 categories of tools:
Recorder while you manually work along your workflow, a recorder tracks your mouse and keyboard actions. After stopping the recording, you might playback (autorun your worflow). The recordings can usually be edited and augmented with additional features.
GUI aware the tool allows to programmatically operate on GUI elements like buttons. This is based on the knowledge of internal structures and names of the GUI elements and their features. Some of these tools also have a recording feature.
Visually the tool “sees” images (usually retangular pixel areas) on the screen and allows to act on these images using mouse and keyboard simulation. There might be some recorder feture as well with such a tool.
SikuliX belongs to the 3rd category and currently does not have a recorder feature.
Answer in progress...
In games with moddable UIs, like many MMOs, you could create a mod that streams data through a series of black and white squares that could be read with optical sensors. From there, a microcontroller could deliver the data back to the PC via USB or wifi.
My approach as a noob. First determine if OCR 100% needed, I think this plays a role in speed.
if possible:
-run game in window (allows for trouble shooting and easy troubleshooting)
-is there a high contrast option for game? Will help Sikuli find things
then you plan out your scenarios:
You have to create different functions for different situations. A lot of gaming is "do you see this?" Then "do this" until that is gone.
Start with small parts you want to automate then build on them. Making sure your parts can scale in case small change need to happen, they will. For instance you want to open the menu if you see an object, lets say a tree.
Assume you have some sort of walking algorithm.
setROI(region1) #focus here for tree
if exists(tRee):
click(loCation) #you could hit the shortcut key to opening the menu
click(iTem) #if the item moves in the menu then you may need to scroll to find it first or you can change the ROI and start seeing if sikuli can differentiate your item from one you dont want to click.
You would get that to loop into other actions and proceed. Goodluck.
I have a point-of-sale application written in Perl/Tk. I use X11::GUITest to do automated testing of it, driving the app via hot-keys bound to the buttons and other widgets (it's normally touch-screen driven). However, X11::GUITest doesn't have a way to "read" text back from the screen, so I resort to augmenting the app to write temp files as well as putting data on the screen. The test scripts then look at the temp files, not the GUI. But I'd love to extend X11::GUITest or make a new CPAN module that can scrape text strings from X11 GUIs. I'm not after graphics-to-text conversion; it's my (faint) understanding that somewhere in the depths of the X window system, label text and such are stored as text strings and rendered to bitmap form late in the pipeline (?).
Anyone have guidance on how to do this, or pointers on where to start?
Yeah, I know I should've adhered to better MVC separation and not actually test at the GUI level, but just below it; however reality got in the way and it is what it is!
The best way to do this is to make your program work with an accessibility framework like ATK (used in GTK applications) and then use that to query for the strings as a screen reader would have to for text-to-speech translation. This is the approach taken by the Linux Desktop Testing Project and dogtail testing frameworks. You get the bonus of using existing, well tested code and making your application more usable by disabled users (as may be required by laws such as the Americans with Disabilities Act in the US and similar laws in many other countries).
If your application is using modern font frameworks, like libXft2, this may be your only choice, as those strings are only in the client application, not the X server, and the character to pixmap conversion is done in the client. (If your text is antialiased, it must be using these instead of the legacy X11 API's.)
Even with the legacy X11 API's though, the X server doesn't store the strings once the text to bitmap conversion is done, so there's no good way to query them short of intercepting them in that case.
The program listres lists resources in widgets, including label texts and I think the contents of text entry boxes. You may be able to use its output directly, extracting what you need, or you may need to look at the source and see how it's done.
I have a scenario that I need some good solid advice on. The question is really about speed of WriteableBitmap vs. images in IsolatedStorage on the Windows Phone.
I have an app that displays a UserControl (#1) which is a little graphically heavy. When the user swipes it, it transitions in a push-left type of transition to bring in a new UserControl (#2) which is also a little graphically heavy. If the user swipes the other way, control #1 is brought in in the same type of push-transition, this time from the right.
What I do today is take a snapshot of #1, load #2 off screen and take a snapshot of it, put both side-by-side in a Canvas control and animate that control either left or right. One of the reasons I don't just use the controls and animate them is they may have animation that starts when they are loaded - my current technique allows me to capture a screen shot of pre-animation and post-animation, depending on which direction they go in.
What I'm wondering, however, if it would be better/faster to just do the above the first time and send the writeablebitmap to IsolatedStorage with Extenstions.SaveJPEG and just use that instead in subsequent tranistion animations.
Would load/render/WriteableBitmap each time generally be faster or load jpeg from IsolatedStorage be faster each time? I see that the Transitions control in the SDK doesn't really do either of these, so I'm open to suggestions that are different that also might improve performance.
I expect this to be very depended on the hardware and application. So it is pretty hard to give an answer based on this input. It doesn't look to hard to test (on actual hardware and with the actual application) so my advice is to build both and test.
The applications I have been working with use both approaches and to be honest I haven't noticed much difference.
Also you might try and enable bitmap caching on the controls. This will give you a writeable bitmap implementation that is very fast.
I'm working on a Cocoa application which will be used for a digital-signage/kiosk style display. I've never done anything like this with Cocoa before, but I'm trying to figure out what the best approach is for building the user interface for it.
My main issue is that I need a way to have the user interface scaled up or down depending on the resolution of the display. When I say scaled, I mean that I want everything including white space to maintain the same sizing ratio. The aspect ratio of the interface needs to remain the same (16x9), but it should always fill the entire width of the display its on.
Sorry if I'm not being descriptive enough.
What are some thoughts?
If I follow you correctly, you want all buttons and views, etc. to get larger, the bigger the screen is (which has nothing to do with the dimensions of your views). If that's the case, there's no automatic way to do this.
With Quartz Debugger (part of Xcode Tools), you can set the scaling factor (see "resolution independence"), but this would need to be manually adjusted per system. What's more is I'm not sure if this tinge is persistent across reboots. I leave that for you to investigate.
As far as I know, though, there's no way to adjust this programmatically as resolution independence is still not an exposed consumer feature of OS X.
If anyone is interested, I seem to have found a solution under this post: http://cocoawithlove.com/2009/02/asteroids-style-game-in-coreanimation.html