Is direct video card access possible? (No API) - windows

I'm now a bit experienced with using OpenGL, which I started using because it's said that it is the only way to invoke video card functions. (besides DirectX - which I like less than OpenGL)
For programming (e.g. in C/C++) the OS gives many APIs, like functions for printing. But these can also be bypassed, by coding in Assembly-language - and call much lower level APIs (which gain speed) or direct CPU calls.
So I started wondering why this wouldn't be possible on the video card. Why should an API like OpenGL or DirectX be needed? The process going on with those is:
API-call >
OS calls video card (with complex opcodes, I think) >
video card responses (in complex binary format) >
OS decodes this format and responses to user (in expected API format)
I believe this should decrease the speed of the rendering process.
So my question is:
Is there any possibility to bypass any graphical API (under Windows) and make direct calls to the video card?
Thanks,
Dennis

Using assembly or bypassing an api doesnt automatically make something faster, often slower as you dont know what the folks that wrote the library know.
it is absolutely possible yes, those libraries are just processor instructions that poke and peek at registers and ram, and you could just as easily poke and peek at registers and ram. The first problem is can you get that information, sure, you can look at the linux drivers or other open source resources. Second, much of the heavy lifting today is done in the graphics chip by logic or graphics processors, so the host is just a go between and not necessarily the bottleneck if there is a bottleneck. And yes you can program the gpus depending on your video card/chip, etc.
You need to determine where the bottleneck really is, if there really is one, maybe the bus is your problem, maybe the operating system is your problem, or the compiler, or the hard disk or the system memory, the processor and architecture itself, caches, etc. At the same time how will you ever learn how to find these things unless you try.
I recommend getting rid of windows completely, no operating system, go bare metal. Take the linux and other open source resources plus anything you can get from the vendor and get closer to the metal. You will also need a lot of info about the pci/pcie bus and bridges, dma controllers, everything in the path. If you dont want to go that low then use linux or bsd or some other command line environment where it is well known how to take over the video system, and take over the video system while retaining an operating system and a development environment (vi/emacs, gcc).
if that is all way too advanced, then I recommend, dabbling in simple gpu routines to get a feel for how the video card works at least at some level and tackle this learning exercise one step at a time.

Related

Bypassing memory management system in Linux, windows and OX

I'm trying to research information about how to implement the memory management system in a DBSM. I know the theoretical part, but can't find anything about bypassing the memory management system in Linux, windows nor mac OX.
I'm trying to implement an efficient DBMS myself for fun. Which should be able to run on any common OS. In Database Systems The complete book, they discus the importance on fitting the relations/tables in adjacent sectors/blocks on hardware for faster read/write. But i can't seem to find anything when crawling the web. Is there a way to bypass the memory management system, such that I can write to specific locations on disk in such that the data saved there are placed on continuously blocks and so that the file system can recognize the data as a file?
I know that this is not a general coding question.
I'm used to write in C, C#, F#, ML, Python and a little gnu assembly.
For clarification I do not mean which DBMS should I use, I'm trying to implement the the DBSM myself.

How I code a wrapper that lies to an app running inside it?

I own a laptop with nVidia Optimus
I tried everything to get rid of it, or make it work, and it refuses to work.
One problem in particular, is that when the WinAPI is called with information about the hardware (for example queries with capabilities, device-id, device name, and so on), apps always get the information for the integrated Intel card, that is terrible, and don't exactly match the nVidia card in capabilities either, this make some games and apps misbehave or crash.
I was wondering, can I somehow override those WinAPI calls, and make them lie? For example when the app asks about GPU Device-ID, I tell to it that it is a arbitrary device I want.
Bonus question: Can this also be applied to ASM calls, like CPUID and RDTSC? Many older games rely on those... also the Intel Compiler infamously made to work with only P4 tend to treat new (Core i7 of any generation) CPUs as AMD, and choose crap code paths.
EDIT: Some people are misunderstanding what I want to code.
I want to make a launcher app to workaround a common nVidia Optimus bug, like those apps to make games borderless, or to make them use a different more compatible version of DirectX than their original.
nVidia Optimus works (usually, it can be done differently) by the machine having a integrated Intel Chip, and a nVidia Discrete GPU, the computer treats the DGPU as a sort of video-coprocessor, the actual video chip is always the Intel video chip, but when Optimus kicks in, the Hardware Accelerated rendering is passed to the DGPU, that after finishings its work, copy the results into Intel's chip framebuffer, that finally show it on the screens.
The bug in this implementation, is that it never considered what happens when an app queries about the video capabilities, because the video chip is always the Intel one, any queries get a reply related to the Intel one, even if the chip that will actually receive the draw calls in this app is the nVidia one.
As result, any mismatched DX or OGL extensions between the GPUs can cause bugs or crashes, programs may assume wrong things about the available computing power and memory, may have timing problems, and so on.
I've been fighting with this tech for years, and found no practical solution, this idea is my "final stand" idea, make a "Optimus Launcher" app, that allows you to launch any game with Optimus and it will work, hopefully without ugly hacks like disabling Secure Boot (I disabled Secure Boot to play Age of Decadence, in machines with Optimus AoD, and other Torque3D games, don't work if Secure Boot is enabled, I have no idea why).
You can hook WinAPI calls and make them do what ever you like but it's nothing which is implemented easily. Furthermore I guess that some anti virus programs will get very nervous if you application is doing stuff like that...
Take a look at this article which is a good start: API hooking revealed

Starting point for coding a virtual device

I want to write something like DaemonTools: a software that presents itself to the system as a real device (a DVD-ROM in the previous example) but it reads the data from a file instead. My requirement is not limited to DVD-ROM. The first goal is a joystick/gamepad for Windows.
I'm a web developer, so I don't know from where I could start such a project. I believe it will have to be written in C/C++, but other than that, I have no clue where to start.
Did anyone tried something like this and can give me some starting tips ?
Most drivers are written in either C or C++, so if you don't know those languages reasonably well, you'll want to get familiar with them before you start. Windows programming uses a lot of interesting shortcuts that might be confusing to a beginner - for example PVOIDs (typedef void* PVOID) and LPVOIDs (typedef void* far LPVOD;). You'll need to be happy with pointers as concepts as well as structures because you'll be using a lot of them. I'd suggest writing a really straightforward win32 app as an exercise in getting to grips with the Windows style of doing C/C++.
Your next port of call then is to navigate the Windows Driver Kit - specifically, you'll need it to build drivers for Windows. At this stage my ability to advise really depends on what you're doing and the hardware you have available etc, or whether or not you're really using hardware. You'll need to know how to drive your hardware and from there you'll need to choose an appropriate way of writing a driver - there are several different types of driver depending on what you need to achieve and it might be you can plug into one of these.
The windows driver kit contains quite a large number of samples, including a driver that implements a virtual toaster. These should provide you with starting points.
I strongly suggest you do the testing of this in a virtual machine. If your driver successfully builds, but causes a runtime error, the result could well crash windows entirely if you're in kernel-mode. You will therefore save yourself some pain by being able to revert the virtual machine if you damage it, as well as not having to wait on your system restarting. It'll also make debugging easier as virtual serial cables can be used.
This is quite a big undertaking, so before you start, I'd research Windows development more thoroughly - check you can't do it using the Windows APIs first, then have a look at the user-mode driver framework, then finally and only if you need to, look at the kernel level stuff.

OpenCL distribution

I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions.
My question however is regarding which OpenCL-implementation to use. E.g. if the computer has a Nvidia GPU I assume it's best to use Nvidia's library but if no GPU is available I want to use Intel's library to use the SIMD-instructions.
How do I achieve this? Is this handled automatically or do I have to include all libraries and implement some logic to pick the right one? It feels like this is a problem that more people than I are facing.
Update
After testing the different OpenCL-drivers this is my experience so far:
Intel: crashed the JVM when JavaCL tried to call it. After a restart it didn't crash the JVM but it also didn't return any usable
devices (I was using an Intel I7-CPU). When I compiled the
OpenCL-code offline it seemed to be able to do some
auto-vectorization so Intel's compiler seems quite nice.
Nvidia: Refused to install their WHQL-drivers because it claimed I didn't have Nvidia-card (that computer has a Geforce GT 330M). When
I tried it on a different computer I managed to get all the way to
create a kernel but at the first execution it crashed the drivers
(the screen flickered for a while and Windows 7 said it had to
restart the drivers). The second execution caused a bluee-screen of
death.
AMD/ATI: Refused to install 32-bit SDK (I tried that since I will be using a 32-bit JVM) but 64-bit SDK worked well. This is the only
driver which I've managed to execute the code on (after a restart
because at first it gave a cryptic error-message when compiling).
However it doesn't seem to be able to do any implicit vectorization
and since I don't have any ATI GPU I didn't get any performance
increase compared to the Java-implementation. If I use vector-types I
might see some improvements though.
TL;DR None of the drivers seem ready for commercial use. I'm probably better of creating JNI-module with C-code compiled to use SSE-instructions.
First try to understand hosts & devices: http://www.streamcomputing.eu/blog/2011-07-14/basic-concept-hosts-and-devices/
Basically you can just do exactly what you described: check if a certain driver is available and if not, try the next one. What you choose first depends completely on your own preference. I would pick the device I have tested my kernel best on. In JavaCL you can pick the fastest device with JavaCL.createBestContext and CLPlatform.getBestDevice, check the host-code here: http://ochafik.com/blog/?p=501
Know NVidia does not support CPUs via their driver; only AMD and Intel do. Also is targeting multiple devices (say 2 GPUs and a CPU) a bit more difficult.
There is no API providing what you want. however, you can do the following:
i suggest you iterate over clGetPlatformIDs and query for the number of devices (clGetDeviceIDs), and device type for each device;
and pick the platform which has both types.
then build a map in u'r code, that maps for each type the list of platforms supporting it, ordered in some manner.
finally, just get the first item in the list corresponding for CL_DEVICE_TYPE_CPU and the first item corresponding for CL_DEVICE_TYPE_GPU.
if both returned results are equal (platform_cpu == platform_gpu) then pick one of them and use it for both.
if there is a platform supporting both, you will get match as before since you got order lists. then u can also do load balancing if u like on a single platform, like what Intel has.
Sorry for being late to the party, but regarding Intel's implementation behaviour under JavaCL, I'm afraid you've been bitten by a JavaCL bug :
https://github.com/ochafik/nativelibs4java/issues/297
Fixed in JavaCL 1.0.0-RC2 !
Cheers

Quick CPU ring mode protection question

I am very curious in messing up with HW. But my top level "messing" so far was linked or inline assembler in C program. If my understanding of CPU and ring mode is right, I cannot directly from user mode app access some low level CPU features, like disabling interrupts, or changing protected mode segments, so I must use system calls to do everything I want.
But, if I am right, drivers can run in ring mode 0. I actually donĀ“t know much about drivers, but this is what I ask for. I just want to know, is learning how to write your own drivers and than call them the way I should go, to do what I wrote?
I know I could write whole new OS (at least to some point), but what I exactly want to do is acessing some low level features of HW from standart windows application. So, is driver the way to go?
Short answer: yes.
Long answer: Managing access to low-level hardware features is exactly the job of the OS kernel and if you only want access to a single feature there's no need to start your own OS from scratch. Most modern OSes, such as WIndows, Linux, or the BSDs, allow you to add code to the kernel through kernel modules.
When writing a kernel module (or device driver), you write code that is going to be executed inside the OS kernel and will thus be running in CPU ring 0. Great power comes with great responsibility, which in this case means that you should really know what you're doing as there will be no pre-configured OS interface to prevent you from doing the wrong things. You should therefore study the manuals of your hardware (e.g., Intel's x86 software developer's manuals, device specs, ...) as well as standard operating systems development literature (where you're also going to find plenty on the web -- OSDev, OSDever, OSR, Linux Device Drivers).
If you want to play with HW write some programs for 16-bit real-mode (or even with your own transition to protected-mode). There you have to deal with ASM, BIOS interrupts, segments, video memory and a lot of other low-level stuff.

Resources