I'm using while loop in the kernel driver for read memory, but I get high cpu usage.
In C# there is Sleep(); which reduces cpu usage in the while loop.
What can I use in the kernel driver on Windows?
Related
I am confused with port mapping and ISR
since i am following an article which mentioned that hardware ports are mapped to memory from 0x00000000 to 0x000003FF
now we can talk with microcontroller of that hardware using these port no using IN and OUT instructions ok
but what is ivt then mean i read ivt contain address of interrupt service routine
everthing is messed in mind
do when we use IN /OUTwith port no cpu checks in ivt and how microcontrollers knows their number
When hardware ports are mapped to memory location then this is called Memory-Mapped IO.
Hardware is accessed by reading/writing data/commands in their registers. In Memory Mapped IO, instead of transmitting data/command to hardware registers, the cpu reads/writes signal/command/data at particular memory locations which are mapped to hardware registers. Therefore, communication between hardware and cpu happens via read/write to specific memory location.
When a hardware in installed it is given a set of fixed memory location for the purpose of Memory Mapped IO and these memory location are recorded. Also, every hardware has its ISR whose address is stored in IVT. Now when a particular hardware interrupts the cpu, the cpu finds the interrupting hardware's ISR address from the IVT. Once the cpu identifies with which hardware the communication (I/O) needs to be done then it communicate with that hardware via Memory Mapped IO by making use of the fixed memory locations which were allocated for that hardware.
I'm doing some gpu calculation using OpenCL where I need to create a buffer with size about 5 GB. My laptop has an integrated gpu with 1.5 GB ram size. I tried to run the code and it gave the wrong result. So I guess it's because the ram of gpu is full. My question is that whether there is some "swap space"(or virtual memory) that gpu can utilize when its ram is full? I know that cpu has this mechanism. But I'm not sure for gpu.
No, it cannot (at least on most GPUs). Because the GPU uses its own memory (the RAM on your graphics card) in general.
Also OpenCL code in your kernels don't do any malloc (inside the kernel). You'll use clCreateBuffer
That would depend on the GPU and whether it had an MMU and DMA access to the host memory.
A GPU with an MMU can virtualize GPU and host memory, so that it can appear as a single address space, with the physical host memory accesses handled by DMA transfer. I would imagine that if your GPU had that capability that would already be done; in which case you problem is most probably elsewhere.
Processes in userspace are scheduled by the kernel scheduler to get processor time but how the different kernel tasks get CPU time? I mean, when no process at userspace are requering CPU time (so CPU is iddle by executing NOP instructions) but some kernel subsystem need to carry out some task regularly, are timers and other hw and sw interrupts the common methods to get CPU time in kernel space?.
It's pretty much the same scheduler. The only difference I could think of is that kernel code has much more control over execution flow. For example, there is direct call to scheduler schedule().
Also in kernel you have 3 execution contexts - hardware interrupt, softirq/bh and process. In hard (and probably soft) interrupt context you can't sleep, so scheduling is not done during executing code in this context.
Is it possible to use the memory of the Xilinx-FPGA Virtex5/7 as a memory mapped into the virtual and/or physical address space of the Intel x86_64-CPU's memory and how to do it?
As maximum, I need to use unified single address space with having of direct memory access (DMA) to the memory of FPGA from CPU (like as simple memory access to CPU-RAM).
CPU: x86_64 Intel Core i7
OS: Linux kernel 2.6
Interface connection: PCI-Express 2.0 8x
You can in theory.
You'll need to write a bunch of VHDL/Verilog to take the PCIe packets and respond to them appropriately, by controlling the address, data and control lines of the internal memory "BlockRAMs", to do its reading and writing. Treating all the BlockRAM as one massive memory is likely to have routing congestion problems I imagine though!
If I use DMA for RAM <-> GPU on CUDA C++, How can I be sure that the memory will be read from the pinned (lock-page) RAM, and not from the CPU cache?
After all, with DMA, the CPU does not know anything about the fact that someone changed the memory and about the need to synchronize the CPU (Cache<->RAM). And as far as I know, std :: memory_barier () from C + +11 does not help with DMA and will not read from RAM, but only will result in compliance between the caches L1/L2/L3. Furthermore, in general, then there is no protocol to resolution conflict between cache and RAM on CPU, but only sync protocols different levels of CPU-cache L1/L2/L3 and multi-CPUs in NUMA: MOESI / MESIF
On x86, the CPU does snoop bus traffic, so this is not a concern. On Sandy Bridge class CPUs, the PCI Express bus controller is integrated into the CPU, so the CPU actually can service GPU reads from its L3 cache, or update its cache based on writes by the GPU.