What happens if you call glBufferData on a buffer currently mapped with glMapBufferRange? I suggest it would be illegal, but I cannot find anything in the spec:
https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glBufferData.xhtml
It is illegal in glDrawArrays spec.
Ok, additional challenge:
What if we have context resource sharing and the buffer is currently mapped in thread A with context A, then thread B on context B calls glBufferData on it?
For a single context scenario when glBufferData() is called then the existing buffer object is deleted, any active bindings of that resource in that context will be unbound, and any active mappings will be removed. If you call glUnmapBuffer() after glBufferData() from within the same context then you'll get a GL_INVALID_OPERATION error, because the state of the new version of the buffer is not initially mapped.
For a multi-context scenario it gets more complicated. OpenGL ES defines a weakly coherent state management model (to avoid expensive locking requirements on performance critical call paths).
Render state that belongs to the context (e.g. binding information, enable bits) is never modifiable by another context (can be implemented lockless).
Resource state that belongs to an object (e.g. buffers, samplers, textures) is weakly coherent. A context will see its own changes immediately, but only pick up changes written by another context when it binds the resource (only needs locking on bind changes).
Resource data payload is not coherent at all. If you want to ensure that data is available from one context in another then you must include manual synchronization between the threads.
Thread A calling glMapBuffer() will create a copy of the master state of Buffer "version 1", including a local state setting that the buffer is "mapped".
Thread B calling glBufferData() will create a new version of the buffer resource "version 2", but this will not impact the state held by Thread A which will continue to reflect the state at the time that the buffer was bound in thread A (version 1).
Thread A calling glUnmapBuffer() will work fine, because it will unmap buffer "version 1" (the mapped state is local to the Thread A context, and that still says the buffer is "mapped").
Note that the data contents of the buffer that Thread A sees after Thread B calls glBufferData() is unpredictable (it could be the old data, it could be the new data), in accordance with the design that data is not coherent at all. If there were no pending draw operations then it is valid for the driver to simply reuse the memory that "version 1" of the buffer was backed by to contain the content that was uploaded for "version 2". If you want guarantees about data consistency across contexts then you need manual synchronization (it's conceptually just like having two threads call glBufferData() on the same buffer at the same time).
I'd recommend reading Chapter 5 of the OpenGL ES 3.2 spec.
Related
I wrote an interrupt handler in linux.
Part of the handler logic I need to access physical address.
I used iormap function but then I fell into KDB during handler time.
I started to debug it and i saw the below code which finally called by ioremap
What should I do? Is there any other way instead of map the region before?
If i will need to map it before it means that i will probably need to map and cache a lot of unused area.
BTW what are the limits for ioremap?
Setting up a new memory mapping is an expensive operation, which typically requires calls to potentially blocking functions (e.g. grabbing locks). So your strategy has two problems:
Calling a blocking function is not possible in your context (there is no kernel thread associated with your interrupt handler, so there is no way for the kernel to resume it if it had to be put to sleep).
Setting up/tearing down a mapping per IRQ would be a bad idea performance-wise (even if we ignore the fact that it can't be done).
Typically, you would setup any mappings you need in a driver's probe() function (or in the module's init() if it's more of a singleton thing). This mapping is then kept in some private device data structure, which is passed as the last argument to some variant of request_irq(), so that the kernel then passes it back as the second argument to the IRQ handler.
Not sure what you mean by "need to map and cache a lot of unused area".
Depending on your particular system, you may end up consuming an entry in your CPU's MMU, or you may just re-use a broader mapping that was setup by whoever wrote the BSP. That's just the cost of doing business on a virtual memory system.
Caching is typically not enabled on I/O memory because of the many side-effects of both reads and writes. For the odd cases when you need it, you have to use ioremap_cached().
I want to render a object with a dynamic vertex buffer and I do rendering in UI thread. I am thinking is it possible to change this vertex buffer content in a non UI thread using Map and Unmap.
Thanks.
YL
The Direct3D 11 multi-threading model is fairly simple:
Calls to the ID3D11Device are thread-safe (unless you used the D3D11_CREATE_DEVICE_SINGLETHREADED flag when you created the device). You can call the methods on this interface from any thread.
Calls to the ID3D11DeviceContext11 are not thread-safe, and you should only call methods on this interface for a given context from a single thread at a time.
This is why Map and Unmap are part of the ID3D11DeviceContext11 rather than ID3D11Device or on the ID3D11Resource itself like it was in Direct3D 10. The operation is inherently serial with other operations.
This means you should have a single thread using the immediate device context (and DXGI), and this should probably be the same thread as your main windows message pump (for the reasons covered in DirectX Graphics Infrastructure (DXGI): Best Practices.
You could Map on the same thread as the one using the immediate context, marshal the pointer to another thread, and then Unmap it from the original thread when that thread completes but this is highly unlikely to improve performance.
See Introduction to Multithreading in Direct3D 11
Let's say process p1 is executing with its own address space(stack,heap,text). When context switch happens, i understand that all the current cpu registers are pushed into PCB before loading process p2. Then TLB is flushed and loaded with p2 address mapping and starts executing with its own address spaces.
What i would like know is the state of p1 address space. Will it be copied to disk and updates its page table before loading process p2?
The specifics of a context switch depend upon the underlying hardware. However, context switches are basically the same, even among different system.
The mistake you have is " i understand that all the current cpu registers are pushed into stack before loading process p2". The registers are stored in an area of memory that is usually called the PROCESS CONTEXT BLOCK (or PCB) whose structure is defined by the processor. Most processors have instructions for loading and saving the process context (i.e., its registers) into this structure. In the case of Intel, this can require multiple instructions saving to multiple blocks because of all the different register sets (e.g. FPU, MMX).
The outgoing process does not have to be written to disk. It may paged out if the system needs more memory but it is possible that it could stay entirely in memory and be ready to execute.
A context switch is simply the exchange of one processor's saved register values for another's.
How does the Linux kernel implement the shared memory mechanism between different processes?
To elaborate further, each process has its own address space. For example, an address of 0x1000 in Process A is a different location when compared to an address of 0x1000 in Process B.
So how does the kernel ensure that a piece of memory is shared between different process, having different address spaces?
Thanks in advance.
Interprocess Communication Mechanisms
Processes communicate with each other and with the kernel to coordinate their activities. Linux supports a number of Inter-Process Communication (IPC) mechanisms. Signals and pipes are two of them but Linux also supports the System V IPC mechanisms named after the Unix TM release in which they first appeared.
Signals
Signals are one of the oldest inter-process communication methods used by Unix TM systems. They are used to signal asynchronous events to one or more processes. A signal could be generated by a keyboard interrupt or an error condition such as the process attempting to access a non-existent location in its virtual memory. Signals are also used by the shells to signal job control commands to their child processes.
There are a set of defined signals that the kernel can generate or that can be generated by other processes in the system, provided that they have the correct privileges. You can list a system's set of signals using the kill command (kill -l).
Pipes
The common Linux shells all allow redirection. For example
$ ls | pr | lpr
pipes the output from the ls command listing the directory's files into the standard input of the pr command which paginates them. Finally the standard output from the pr command is piped into the standard input of the lpr command which prints the results on the default printer. Pipes then are unidirectional byte streams which connect the standard output from one process into the standard input of another process. Neither process is aware of this redirection and behaves just as it would normally. It is the shell which sets up these temporary pipes between the processes.
In Linux, a pipe is implemented using two file data structures which both point at the same temporary VFS inode which itself points at a physical page within memory. Figure shows that each file data structure contains pointers to different file operation routine vectors; one for writing to the pipe, the other for reading from the pipe.
Sockets
Message Queues: Message queues allow one or more processes to write messages, which will be read by one or more reading processes. Linux maintains a list of message queues, the msgque vector; each element of which points to a msqid_ds data structure that fully describes the message queue. When message queues are created a new msqid_ds data structure is allocated from system memory and inserted into the vector.
System V IPC Mechanisms: Linux supports three types of interprocess communication mechanisms that first appeared in Unix TM System V (1983). These are message queues, semaphores and shared memory. These System V IPC mechanisms all share common authentication methods. Processes may access these resources only by passing a unique reference identifier to the kernel via system calls. Access to these System V IPC objects is checked using access permissions, much like accesses to files are checked. The access rights to the System V IPC object is set by the creator of the object via system calls. The object's reference identifier is used by each mechanism as an index into a table of resources. It is not a straight forward index but requires some manipulation to generate the index.
Semaphores: In its simplest form a semaphore is a location in memory whose value can be tested and set by more than one process. The test and set operation is, so far as each process is concerned, uninterruptible or atomic; once started nothing can stop it. The result of the test and set operation is the addition of the current value of the semaphore and the set value, which can be positive or negative. Depending on the result of the test and set operation one process may have to sleep until the semphore's value is changed by another process. Semaphores can be used to implement critical regions, areas of critical code that only one process at a time should be executing.
Say you had many cooperating processes reading records from and writing records to a single data file. You would want that file access to be strictly coordinated. You could use a semaphore with an initial value of 1 and, around the file operating code, put two semaphore operations, the first to test and decrement the semaphore's value and the second to test and increment it. The first process to access the file would try to decrement the semaphore's value and it would succeed, the semaphore's value now being 0. This process can now go ahead and use the data file but if another process wishing to use it now tries to decrement the semaphore's value it would fail as the result would be -1. That process will be suspended until the first process has finished with the data file. When the first process has finished with the data file it will increment the semaphore's value, making it 1 again. Now the waiting process can be woken and this time its attempt to increment the semaphore will succeed.
Shared Memory: Shared memory allows one or more processes to communicate via memory that appears in all of their virtual address spaces. The pages of the virtual memory is referenced by page table entries in each of the sharing processes' page tables. It does not have to be at the same address in all of the processes' virtual memory. As with all System V IPC objects, access to shared memory areas is controlled via keys and access rights checking. Once the memory is being shared, there are no checks on how the processes are using it. They must rely on other mechanisms, for example System V semaphores, to synchronize access to the memory.
Quoted from tldp.org.
There are two kinds of shared memory in Linux.
If A and B are Parent process and Child process respectively, each of them uses their own pte to access the shared memory.The shared memory is shared by the fork mechanism. So every thing is good, right?(More details, please look at the kernel function copy_one_pte() and related functions.)
If A and B are not parent and Child, they use the a public key to access the shared memory.
Let's assume that A creates a shared memory though System V shmget() with a key and, correspondly, the kernel creates a file(file name is "SYSTEMV+key") for process A in a shmem/tmpfs which is an internal RAM-based filesystem. It's mounted by the kenrel(Check shmem_init()). And the shared memory region is handled by shmem/tmpfs. Basically, it's handled by the page fault mechanism when process A accesses the shared memory region.
If process B wants to access that shared memory region created by process A. Process B should use shmget() with the same key used by Process A. Then process B can find the file("SYSTEMV+key") and map the file into Process B's address space.
I use a Shared Memory area to get som data to a second process.
The first process uses CreateFileMapping(INVALID_HANDLE_VALUE, ..., PAGE_READWRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The second process uses OpenFileMapping(FILE_MAP_WRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The docs state:
Multiple views of a file mapping object
are coherent if they contain identical data at a specified time.
This occurs if the file views are derived from any file mapping object
that is backed by the same file. (...)
With one important exception, file views derived from any file mapping
object that is backed by the same file are coherent or identical at a
specific time. Coherency is guaranteed for views within a process and
for views that are mapped by different processes.
The exception is related to remote files. (...)
Since I'm just using the Shared Memory as is (backed by the paging file) I would have assumed that some synchronization is needed between processes to see a coherent view of the memory another process has written. I'm unsure however what synchronization would be needed exactly.
The current pattern I have (simplified) is like this:
Process1 | Process2
... | ...
/* write to shared mem, */ | ::WaitForSingleObject(hDataReady); // real code has error handling
/* then: */
::SetEvent(hDataReady); | /* read from shared mem after wait returns */
... | ...
Is this enough synchronization, even for shared memory?
What sync is needed in general between the two processes?
Note that inside of one single process, the call to SetEvent would certainly constitute a full memory barrier, but it isn't completely clear to me whether that holds for shared memory across processes.
I have since come to believe that for memory-access synchronization purposes, it really does not matter if the concurrently accessed memory is shared between processes or just withing one process between threads.
That is, for Shared Memory (the one shared between processes) on Windows, the same restrictions and guidelines apply as with "normal" memory within a process that is just shared between the threads of the process.
The reason I believe this is that a process and a thread are somewhat orthogonal on Windows. A process is a "container" for threads, and in order for the process to be able to do anything, it needs at least one thread. So, for memory that is mapped into multiple process' address space, the synchronization requirements on the threads running within these different processes should be actually the same as for threads running within the same process.
So, the answer to my question Is this enough synchronization, even for shared memory? is that shared memory requires the same synchronization as "normal" memory. But of course, not all synchronization techniques works across process boundaries, so you are restricted in what you can use. (A Critical Section for exampled cannot be used across processes.)
If both of those code snippets are in a loop then in addition to the event you'll need a mutex so that Process1 doesn't start writing again while Process2 is still reading. To be more specific, the mutex must be acquired before reading or writing and released after reading or writing. Make sure the mutex has been released before calling WFSO in Process2.
My understanding is that although Windows may guarantee view coherency, it does not guarantee a write is fully completed before the client reads it.
For example, if you were writing "Hello world!" to the view, it could only be partially written when the client reads it, such as "Hello w".
Therefore, the view would be byte coherent, but not message coherent.
Personally, I use a mutex to guarantee thread-safe access.
Use Semaphore should be better than Event.