Map buffer with glMapBuffer, then use pointer in different thread - opengl-es

I'm trying to optimize a program that issues all OpenGL ES calls in the main thread. Main performance issue seems to be frequent buffer uploads via glBufferData, more specifically a memcpy inside this function that is done synchronously with the main thread (the buffers a pretty large).
My current plan would be to instead map the buffer in the main thread using glMapBuffer, then send the pointer to a different thread which performs the memcpy, once this thread is finished call glUnmapBuffer again in the main thread. After that, the buffer is used for rendering.
Would this approach work or is it dangerous to use glMapBuffer pointers in a thread that doesn't have the gl context? Or is there a way to ensure no memcpy is performed on the main thread and everything is done on the pipeline thread?
Regards

Once you've mapped the buffer then the pointer is a "normal" CPU pointer, so can be used just like any other CPU pointer including cross-thread access.
Just make sure that you've complete any writes and sync the threads before calling glUnmapBuffer().

Related

WriteFile with Overlapped IO and ERROR_DISK_FULL?

Wonder if anyone knows the internal design of WriteFile() (Storage Team Here?) with overlapped IO for file on on a disk drive/file system. Clearly when using the system buffer and standard synchronous WriteFile() it checks for full disk and allocates space prior to returning because the system cache holding the actual data is written later (a problem causes a delayed write error from the OS).
So the question is: would the same be true when using OVERLAPPED structure for asynchronous WriteFile() that expands the file beyond free space? e.g. It would return ERROR_DISK_FULL right away before pending the IO?
The reason to know is for recovery of freeing disk space, or inserting new media, and resuming the writes. If done this way, it's fairly straight forward, if after pending the IO, you could have a bunch of queued IO that then has to be synchronized and additional information tracked for all queued items in case moving to new media to adjust the offsets and such.
TIA!!
What you mean by asynchronous file operations (WriteFile() etc.) - these operations are only asynchronous for the caller. Internally they work the same way as synchronous (blocking) ones. The implementation of a blocking call invokes the non-blocking one and waits for an event the same as if you were using the OVERLAPPED structure. So, on your question of whether WriteFile would return ERROR_DISK_FULL before pending the IO, the answer is No. The rationale of non-blocking calls is not to make disk operation return results faster, but to allow a single thread to do multiple I/O operations in parallel without the need to create multiple threads.
if no enough disk space for complete write operation - you got ERROR_DISK_FULL (STATUS_DISK_FULL) when I/O operation will complete. are filesystem driver just complete your write request with STATUS_DISK_FULL (converted to ERROR_DISK_FULL) or first return STATUS_PENDING (converted to ERROR_IO_PENDING by win32) and then complete I/O with STATUS_DISK_FULL - this is undefined. can be both. final status will be ERROR_DISK_FULL but you cannot assume are operation will complete synchronous or asynchronous

A DLL should free heap memory only if the DLL is unloaded dynamically?

Question Purpose: Reality check on the MS docs of DllMain.
It is "common" knowledge that you shouldn't do too much in DllMain, there are definite things you must never do, some best practises.
I now stumbled over a new gem in the docs, that makes little sense to me: (emph. mine)
When handling DLL_PROCESS_DETACH, a DLL should free resources such as
heap memory only if the DLL is being unloaded dynamically (the
lpReserved parameter is NULL). If the process is terminating (the
lpvReserved parameter is non-NULL), all threads in the process except
the current thread either have exited already or have been explicitly
terminated by a call to the ExitProcess function, which might leave
some process resources such as heaps in an inconsistent state. In this
case, it is not safe for the DLL to clean up the resources. Instead,
the DLL should allow the operating system to reclaim the memory.
Since global C++ objects are cleaned up during DllMain/DETACH, this would imply that global C++ objects must not free any dynamic memory, because the heap may be in an inconsistent state. / When the DLL is "linked statically" to the executable. / Certainly not what I see out there - global C++ objects (iff there are) of various (ours, and third party) libraries allocate and deallocate just fine in their destructors. (Barring other ordering bugs, o.c.)
So, what specific technical problem is this warning aimed at?
Since the paragraph mentions thread termination, could there be a heap corruption problem when some threads are not cleaned up correctly?
The ExitProcess API in general does the follwoing:
Enter Loader Lock critical section
lock main process heap (returned by GetProcessHeap()) via HeapLock(GetProcessHeap()) (ok, of course via RtlLockHeap) (this is very important step for avoid deadlock)
then terminate all threads in process, except current (by call NtTerminateProcess(0, 0) )
then call LdrShutdownProcess - inside this api loader walk by loaded module list and sends DLL_PROCESS_DETACH with lpvReserved nonnull.
finally call NtTerminateProcess(NtCurrentProcess(), ExitCode ) which terminates the process.
The problem here is that threads terminated in arbitrary place. For example, thread can allocate or free memory from any heap and be inside heap critical section, when it terminated. As a result, if code during DLL_PROCESS_DETACH tries to free a block from the same heap, it deadlocks when trying to enter this heap's critical section (if of course heap implementation use it).
Note that this does not affect the main process heap, because we call HeapLock for it before terminate all threads (except current). The purpose of this: We wait in this call until all another threads exit from process heap critical section and after we acquire the critical section, no other threads can enter it - because the main process heap is locked.
So, when we terminate threads after locking the main heap - we can be sure that no other threads that are killed are inside main heap critical section or heap structure in inconsistent state. Thanks to RtlLockHeap call. But this is related only to main process heap. Any other heaps in the process are not locked. So these can be in inconsistent state during DLL_PROCESS_DETACH or can be exclusively acquired by an already terminated thread.
So - using HeapFree for GetProcessHeap or saying LocalFree is safe (however not documented) here.
Using HeapFree for any other heaps is not safe if DllMain is called during process termination.
Also if you use another custom data structures by several threads - it can be in inconsistent state, because another threads (which can use it) terminated in arbitrary point.
So this note is warning that when lpvReserved parameter is non-NULL (what is mean DllMain is called during process termination) you need to be especially careful in clean up the resources. Anyway all internal memory allocations will be free by operation system when process died.
As an addendum to RbMm's excellent answer, I'll add a quote from ExitProcess that does a much better job - than the DllMain docs do - at explaining, why heap operation (or any operation, really) can be compromised:
If one of the terminated threads in the process holds a lock and the
DLL detach code in one of the loaded DLLs attempts to acquire the same
lock, then calling ExitProcess results in a deadlock. In contrast, if
a process terminates by calling TerminateProcess, the DLLs that the
process is attached to are not notified of the process termination.
Therefore, if you do not know the state of all threads in your
process, it is better to call TerminateProcess than ExitProcess. Note
that returning from the main function of an application results in a
call to ExitProcess.
So, it all boils down to: IFF you application has "runaway" threads that may hold any lock, the (CRT) heap lock being a prominent example, you have a big problem during shutdown, when you need to access the same structures (e.g. the heap), that your "runaway" threads are using.
Which just goes to show that you should shut down all your threads in a controlled way.

Update vertex buffer in non UI thread using DirectX11

I want to render a object with a dynamic vertex buffer and I do rendering in UI thread. I am thinking is it possible to change this vertex buffer content in a non UI thread using Map and Unmap.
Thanks.
YL
The Direct3D 11 multi-threading model is fairly simple:
Calls to the ID3D11Device are thread-safe (unless you used the D3D11_CREATE_DEVICE_SINGLETHREADED flag when you created the device). You can call the methods on this interface from any thread.
Calls to the ID3D11DeviceContext11 are not thread-safe, and you should only call methods on this interface for a given context from a single thread at a time.
This is why Map and Unmap are part of the ID3D11DeviceContext11 rather than ID3D11Device or on the ID3D11Resource itself like it was in Direct3D 10. The operation is inherently serial with other operations.
This means you should have a single thread using the immediate device context (and DXGI), and this should probably be the same thread as your main windows message pump (for the reasons covered in DirectX Graphics Infrastructure (DXGI): Best Practices.
You could Map on the same thread as the one using the immediate context, marshal the pointer to another thread, and then Unmap it from the original thread when that thread completes but this is highly unlikely to improve performance.
See Introduction to Multithreading in Direct3D 11

CreateFileMapping and MapViewOfFile with interprocess (un)synchronized multithreaded access?

I use a Shared Memory area to get som data to a second process.
The first process uses CreateFileMapping(INVALID_HANDLE_VALUE, ..., PAGE_READWRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The second process uses OpenFileMapping(FILE_MAP_WRITE, ...) and MapViewOfFile( ... FILE_MAP_WRITE).
The docs state:
Multiple views of a file mapping object
are coherent if they contain identical data at a specified time.
This occurs if the file views are derived from any file mapping object
that is backed by the same file. (...)
With one important exception, file views derived from any file mapping
object that is backed by the same file are coherent or identical at a
specific time. Coherency is guaranteed for views within a process and
for views that are mapped by different processes.
The exception is related to remote files. (...)
Since I'm just using the Shared Memory as is (backed by the paging file) I would have assumed that some synchronization is needed between processes to see a coherent view of the memory another process has written. I'm unsure however what synchronization would be needed exactly.
The current pattern I have (simplified) is like this:
Process1 | Process2
... | ...
/* write to shared mem, */ | ::WaitForSingleObject(hDataReady); // real code has error handling
/* then: */
::SetEvent(hDataReady); | /* read from shared mem after wait returns */
... | ...
Is this enough synchronization, even for shared memory?
What sync is needed in general between the two processes?
Note that inside of one single process, the call to SetEvent would certainly constitute a full memory barrier, but it isn't completely clear to me whether that holds for shared memory across processes.
I have since come to believe that for memory-access synchronization purposes, it really does not matter if the concurrently accessed memory is shared between processes or just withing one process between threads.
That is, for Shared Memory (the one shared between processes) on Windows, the same restrictions and guidelines apply as with "normal" memory within a process that is just shared between the threads of the process.
The reason I believe this is that a process and a thread are somewhat orthogonal on Windows. A process is a "container" for threads, and in order for the process to be able to do anything, it needs at least one thread. So, for memory that is mapped into multiple process' address space, the synchronization requirements on the threads running within these different processes should be actually the same as for threads running within the same process.
So, the answer to my question Is this enough synchronization, even for shared memory? is that shared memory requires the same synchronization as "normal" memory. But of course, not all synchronization techniques works across process boundaries, so you are restricted in what you can use. (A Critical Section for exampled cannot be used across processes.)
If both of those code snippets are in a loop then in addition to the event you'll need a mutex so that Process1 doesn't start writing again while Process2 is still reading. To be more specific, the mutex must be acquired before reading or writing and released after reading or writing. Make sure the mutex has been released before calling WFSO in Process2.
My understanding is that although Windows may guarantee view coherency, it does not guarantee a write is fully completed before the client reads it.
For example, if you were writing "Hello world!" to the view, it could only be partially written when the client reads it, such as "Hello w".
Therefore, the view would be byte coherent, but not message coherent.
Personally, I use a mutex to guarantee thread-safe access.
Use Semaphore should be better than Event.

Windows: how to spawn threads from (NDIS) kernel driver?

Which function is recommended to spawn a new thread within NDIS5/6 context? Looking for something that is guaranteed to work at IRQL=PASSIVE (e.g. no bsods out of nothing); by a quick examination of ndis.h contents, found nothing.
Also, it is planned to use a newly spawned thread for calling upon NdisFreeMemory* family, will it be causing any problems to free allocated, but unused memory from a different thread?
Threading is outside the scope of NDIS. If you need to start a new thread, use the standard kernel routines (like PsCreateSystemThread). Note that usually timers and work items are sufficicent for most miniport needs. It is unusual for an NDIS miniport to create its own thread, although I suppose there are valid cases where it might be a fair design.
It is ok to allocate memory on one thread and free it on another.

Resources