In Kernel\include\linux\compiler.h
#define __acquire(x) __context__(x,1)
#define __release(x) __context__(x,-1)
Please help me to understand, in above statements what we are trying to achieve with context. I couldn't find Its details.
I crossed It while understanding spinlock implementation in linux kernel.
As from http://linux.die.net/man/1/sparse:
-Wcontext Warn about potential errors in synchronization or other delimited contexts. Sparse supports several means of designating
functions or statements that delimit contexts, such as
synchronization. Functions with the extended attribute
attribute((context(expression,in_context,out_context)) require the context expression (for instance, a lock) to have the value in_context
(a constant nonnegative integer) when called, and return with the
value out_context (a constant nonnegative integer). For APIs defined
via macros, use the statement form
context(expression,in_value,out_value) in the body of the macro.
With -Wcontext Sparse will warn when it sees a function change the
context without indicating this with a context attribute, either by
decreasing a context below zero (such as by releasing a lock without
acquiring it), or returning with a changed context (such as by
acquiring a lock without releasing it). Sparse will also warn about
blocks of code which may potentially execute with different contexts.
Sparse issues these warnings by default. To turn them off, use
-Wno-context.
Related
I was wondering if someone could explain me why Ruby's IO::pwrite function is said to be thread-safe in the documentation:
This is advantageous to combining IO#seek and IO#write in that it is
atomic, allowing multiple threads/process to share the same IO object
for reading the file at various location
My understanding of atomicity is that it's all or nothing, if an error is raised the "transaction" will be rolled back so in this case the file would be closed with its original contents (correct?).
Atomicity does not guarantee thread synchronization however, unless rb_thread_io_blocking_region is a synchronized method?
Here's a snippet of the source of the pwrite function, also available here
n = (ssize_t)rb_thread_io_blocking_region(internal_pwrite_func, &arg, fptr->fd);
if (n < 0) rb_sys_fail_path(fptr->pathv);
rb_str_tmp_frozen_release(str, tmp);
return SSIZET2NUM(n);
}
The synchronization is performed by the kernel (the operating system), not Ruby.
As per the documentation, Ruby's pwrite calls this pwrite which takes care of the synchronization.
The behavior of pwrite system call is described here. Specifically:
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was
modified by that write shall return the data specified by the
write() for that position until such byte positions are again
modified.
Any subsequent successful write() to the same byte position in the
file shall overwrite that file data.
The extensive rationale discusses serialization in more detail.
I am creating a wrapper over c library that recieves some financial data and I want to collect it into python data type (dict with list of field names and list of lists with financial data fields).
On the c level there is function that starts "listening" to some port and when any event appears some user-defined function is called. This function is written in cython. Simplified example of such function is here:
cdef void default_listener(const event_data_t* data, int data_count, void* user_data):
cdef trade_t* trades = <trade_t*>data # cast recieved data according to expected type
cdef dict py_data = <dict>user_data # cast user_data to initial type(dict in our case)
for i in range(data_count):
# append to list in the dict that we passed to the function
# fields of recieved struct
py_data['data'].append([trades[i].price,
trades[i].size,
]
)
The problem: when there is only one python process with this function started, there are no problems, but if I start another python process and run the same function one of the processes will be terminated in undetermiined amount of time. I suppose that this happens because two functions that are called simultaniously in different processes may try to write to the same part of the memory. May this be the case?
If this is the case, are there any ways to prevent two processes use the same memory? Or maybe some lock can be established before the cython code starts to write?
P.S.: I also have read this article and according to it for each python process there is some memory allocated that does not intersect with parts for other processes. But it is unclear for me, is this allocated memory also available for underlying c functions or these functions have acces to another fields that may intersect
I'm taking a guess at the answer based on your comment - if it's wrong then I'll delete it, but I think it's likely enough to be right to be worth posting as an answer.
Python has a locking mechanism known as the Global Interpreter Lock (or GIL). This ensures that multiple threads don't attempt to access the same memory simultaneously (including memory internal to Python, that may not be obvious to the user).
Your Cython code will be working on the assumption that its thread holds the GIL. I strongly suspect that this isn't true, and so performing any operations on a Python object will likely cause a crash. One way to deal with this would be to follow this section of documentation in the C code that calls the Cython code. However, I suspect it's easier to handle in Cython.
First tell Cython that the function is "nogil" - it does not require the GIL:
cdef void default_listener(const event_data_t* data, int data_count, void* user_data) nogil:
If you try to compile now it will fail, since you use Python types within the function. To fix this, claim the GIL within your Cython code.
cdef void default_listener(...) nogil:
with gil:
default_listener_impl(...)
What I've done is put the implementation in a separate function that does require the GIL (i.e. doesn't have a nogil attached). The reason for this is that you can't put cdef statements in the with gil section (as you say in your comment) - they have to be outside it. However, you can't put cdef dict outside it, because it's a Python object. Therefore a separate function is the easiest solution. The separate function looks almost exactly like default_listener does now.
It's worth knowing that this isn't a complete locking mechanism - it's really only to protect the Python internals from being corrupted - an ordinary Python thread will release and regain the GIL periodically automatically, and that may be while you're "during" an operation. Cython won't release the GIL unless you tell it to (in this case, at the end of the with gil: block) so does hold an exclusive lock during this time. If you need finer control of locking then you may want to look at either the multithreading library, or wrapping some C locking library.
I wrote some code like this:
shared_ptr<int> r = make_shared<int>();
int *ar = r.get();
delete ar; // report double free or corruption
// still some code
When the code ran up to delete ar;, the program crashed, and reported "double free or corruption", I'm confused why double free? The "r" is still in the scope, and not popped-off from stack. Do the delete operator do something magic?? Does it know the raw pointer is handled by a smart pointer currently? and then counter in "r" be decremented to zero automatically?
I know the operations is not recommended, but I want to know why?
You are deleting a pointer that didn't come from new, so you have undefined behavior (anything can happen).
From cppreference on delete:
For the first (non-array) form, expression must be a pointer to an object type or a class type contextually implicitly convertible to such pointer, and its value must be either null or pointer to a non-array object created by a new-expression, or a pointer to a base subobject of a non-array object created by a new-expression. If expression is anything else, including if it is a pointer obtained by the array form of new-expression, the behavior is undefined.
If the allocation is done by new, we can be sure that the pointer we have is something we can use delete on. But in the case of shared_ptr.get(), we cannot be sure if we can use delete because it might not be the actual pointer returned by new.
shared_ptr<int> r = make_shared<int>();
There is no guarantee that this will call new int (which isn't strictly observable by the user anyway) or more generally new T (which is observable with a user defined, class specific operator new); in practice, it won't (there is no guarantee that it won't).
The discussion that follows isn't just about shared_ptr, but about "smart pointers" with ownership semantics. For any owning smart pointer smart_owning:
The primary motivation for make_owning instead of smart_owning<T>(new T) is to avoid having a memory allocation without owner at any time; that was essential in C++ when order of evaluation of expressions didn't provide the guarantee that evaluation of the sub-expressions in the argument list was immediately before call of that function; historically in C++:
f (smart_owning<T>(new T), smart_owning<U>(new U));
could be evaluated as:
T *temp1 = new T;
U *temp2 = new U;
auto &&temp3 = smart_owning<T>(temp1);
auto &&temp4 = smart_owning<U>(temp2);
This way temp1 and temp2 are not managed by any owning object for a non trivial time:
obviously new U can throw an exception
constructing an owning smart pointer usually requires the allocation of (small) ressources and can throw
So either temp1 or temp2 could be leaked (but not both) if an exception was thrown, which was the exact problem we were trying to avoid in the first place. This means composite expressions involving construction of owning smart pointers was a bad idea; this is fine:
auto &&temp_t = smart_owning<T>(new T);
auto &&temp_u = smart_owning<U>(new U);
f (temp_t, temp_u);
Usually expression involving as many sub-expression with function calls as f (smart_owning<T>(new T), smart_owning<U>(new U)) are considered reasonable (it's a pretty simple expression in term of number of sub-expressions). Disallowing such expressions is quite annoying and very difficult to justify.
[This is one reason, and in my opinion the most compelling reason, why the non determinism of the order of evaluation was removed by the C++ standardisation committee so that such code is not safe. (This was an issue not just for memory allocated, but for any managed allocation, like file descriptors, database handles...)]
Because code frequently needed to do things such as smart_owning<T>(allocate_T()) in sub-expressions, and because telling programmers to decompose moderately complex expressions involving allocation in many simple lines wasn't appealing (more lines of code doesn't mean easier to read), the library writers provided a simple fix: a function to do the creation of an object with dynamic lifetime and the creation of its owning object together. That solved the order of evaluation problem (but was complicated at first because it needed perfect forwarding of the arguments of the constructor).
Giving two tasks to a function (allocate an instance of T and a instance of smart_owning) gives the freedom to do an interesting optimization: you can avoid one dynamic allocation by putting both the managed object and its owner next to each others.
But once again, that was not the primary purpose of functions like make_shared.
Because exclusive ownership smart pointers by definition don't need to keep a reference count, and by definition don't need to share the data needed for the deleter either between instances, and so can keep that data in the "smart pointer"(*), no additional allocation is needed for the construction of unique_ptr; yet a make_unique function template was added, to avoid the dangling pointer issue, not to optimize a non-thing (an allocation that isn't done in the fist place).
(*) which BTW means unique owner "smart pointers" do not have pointer semantic, as pointer semantic implies that you can makes copies of the "pointer", and you can't have two copies of a unique owner pointing to the same instance; "smart pointers" were never pointers anyway, the term is misleading.
Summary:
make_shared<T> does an optional optimization where there is no separate dynamic memory allocation for T: there is no operator new(sizeof (T)). There is obviously still the creation of an instance with dynamic lifetime with another operator new: placement new.
If you replace the explicit memory deallocation with an explicit destruction and add a pause immediately after that point:
class C {
public:
~C();
};
shared_ptr<C> r = make_shared<C>();
C *ar = r.get();
ar->~C();
pause(); // stops the program forever
The program will probably run fine; it is still illogical, indefensible, incorrect to explicitly destroy an object managed by a smart pointer. It isn't "your" resource. If pause() could exit with an exception, the owning smart pointer would try to destroy the managed object which doesn't even exist anymore.
It of course depends on how library implements make_shared, however most probable implementation is that:
std::make_shared allocates one block for two things:
shared pointer control block
contained object
std::make_shared() will invoke memory allocator once and then it will call placement new twice to initialize (call constructors) of mentioned two things.
| block requested from allocator |
| shared_ptr control block | X object |
#1 #2 #3
That means that memory allocator has provided one big block, which address is #1.
Shared pointer then uses it for control block (#1) and actual contained object (#2).
When you invoke delete with actual object kept by shred_ptr ( .get() ) you call delete(#2).
Because #2 is not known by allocator you get an corruption error.
See here. I quot:
std::shared_ptr is a smart pointer that retains shared ownership of an object through a pointer. Several shared_ptr objects may own the same object. The object is destroyed and its memory deallocated when either of the following happens:
the last remaining shared_ptr owning the object is destroyed;
the last remaining shared_ptr owning the object is assigned another pointer via operator= or reset().
The object is destroyed using delete-expression or a custom deleter that is supplied to shared_ptr during construction.
So the pointer is deleted by shared_ptr. You're not suppose to delete the stored pointer yourself
UPDATE:
I didn't realize that there were more statements and the pointer was not out of scope, I'm sorry.
I was reading more and the standard doesn't say much about the behavior of get() but here is a note, I quote:
A shared_ptr may share ownership of an object while storing a pointer to another object. get() returns the stored pointer, not the managed pointer.
So it looks that it is allowed that the pointer returned by get() is not necessarily the same pointer allocated by the shared_ptr (presumably using new). So delete that pointer is undefined behavior. I will be looking a little more into the details.
UPDATE 2:
The standard says at § 20.7.2.2.6 (about make_shared):
6 Remarks: Implementations are encouraged, but not required, to perform no more than one memory allocation. [ Note: This provides efficiency equivalent to an intrusive smart pointer. — end note ]
7 [ Note: These functions will typically allocate more memory than sizeof(T) to allow for internal bookkeeping structures such as the reference counts. — end note ]
So an specific implementation of make_shared could allocate a single chunk of memory (or more) and use part of that memory to initialize the stored pointer (but maybe not all the memory allocated). get() must return a pointer to the stored object, but there is no requirement by the standard, as previously said, that the pointer returned by get() has to be the one allocated by new. So delete that pointer is undefined behavior, you got a signal raised but anything can happen.
Recently I read the source of leveldb, the source url is https://leveldb.googlecode.com/files/leveldb-1.13.0.tar.gz
And when I read db/db_impl.cc,there comes the following code:
mutex_.AssertHeld()
I follow it into file port/port_posix.h,and I find the following :
void AssertHeld() { }
Then I grep in the souce dir,but can't find anyother implementation of the AssertHeld() anymore.
So here is my question,what does the mutex_.AssertHeld() do in db/db_impl.cc? THX
As you have observed it does nothing in the default implementation. The function seems to be a placeholder for checking whether a particular thread holds a mutex and optionally abort if it doesn't. This would be equivalent to the normal asserts we use for variables but applied on mutexes.
I think the reason it is not implemented yet is we don't have an equivalent light weight function to assert whether a thread holds a lock in pthread_mutex_t used in the default implementation. Some platforms which has that capability could fill this implementation as part of porting process. Searching online I did find some implementation for this function in the windows port of leveldb. I can see one way to implement it using a wrapper class over pthread_mutex_t and setting some sort of a thread id variable to indicate which thread(s) currently holds the mutex, but it will have to be carefully implemented given the race conditions that can arise.
Consider a complex, memory hungry, multi threaded application running within a 32bit address space on windows XP.
Certain operations require n large buffers of fixed size, where only one buffer needs to be accessed at a time.
The application uses a pattern where some address space the size of one buffer is reserved early and is used to contain the currently needed buffer.
This follows the sequence:
(initial run) VirtualAlloc -> VirtualFree -> MapViewOfFileEx
(buffer changes) UnMapViewOfFile -> MapViewOfFileEx
Here the pointer to the buffer location is provided by the call to VirtualAlloc and then that same location is used on each call to MapViewOfFileEx.
The problem is that windows does not (as far as I know) provide any handshake type operation for passing the memory space between the different users.
Therefore there is a small opportunity (at each -> in my above sequence) where the memory is not locked and another thread can jump in and perform an allocation within the buffer.
The next call to MapViewOfFileEx is broken and the system can no longer guarantee that there will be a big enough space in the address space for a buffer.
Obviously refactoring to use smaller buffers reduces the rate of failures to reallocate space.
Some use of HeapLock has had some success but this still has issues - something still manages to steal some memory from within the address space.
(We tried Calling GetProcessHeaps then using HeapLock to lock all of the heaps)
What I'd like to know is there anyway to lock a specific block of address space that is compatible with MapViewOfFileEx?
Edit: I should add that ultimately this code lives in a library that gets called by an application outside of my control
You could brute force it; suspend every thread in the process that isn't the one performing the mapping, Unmap/Remap, unsuspend the suspended threads. It ain't elegant, but it's the only way I can think of off-hand to provide the kind of mutual exclusion you need.
Have you looked at creating your own private heap via HeapCreate? You could set the heap to your desired buffer size. The only remaining problem is then how to get MapViewOfFileto use your private heap instead of the default heap.
I'd assume that MapViewOfFile internally calls GetProcessHeap to get the default heap and then it requests a contiguous block of memory. You can surround the call to MapViewOfFile with a detour, i.e., you rewire the GetProcessHeap call by overwriting the method in memory effectively inserting a jump to your own code which can return your private heap.
Microsoft has published the Detour Library that I'm not directly familiar with however. I know that detouring is surprisingly common. Security software, virus scanners etc all use such frameworks. It's not pretty, but may work:
HANDLE g_hndPrivateHeap;
HANDLE WINAPI GetProcessHeapImpl() {
return g_hndPrivateHeap;
}
struct SDetourGetProcessHeap { // object for exception safety
SDetourGetProcessHeap() {
// put detour in place
}
~SDetourGetProcessHeap() {
// remove detour again
}
};
void MapFile() {
g_hndPrivateHeap = HeapCreate( ... );
{
SDetourGetProcessHeap d;
MapViewOfFile(...);
}
}
These may also help:
How to replace WinAPI functions calls in the MS VC++ project with my own implementation (name and parameters set are the same)?
How can I hook Windows functions in C/C++?
http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf
Imagine if I came to you with a piece of code like this:
void *foo;
foo = malloc(n);
if (foo)
free(foo);
foo = malloc(n);
Then I came to you and said, help! foo does not have the same address on the second allocation!
I'd be crazy, right?
It seems to me like you've already demonstrated clear knowledge of why this doesn't work. There's a reason that the documention for any API that takes an explicit address to map into lets you know that the address is just a suggestion, and it can't be guaranteed. This also goes for mmap() on POSIX.
I would suggest you write the program in such a way that a change in address doesn't matter. That is, don't store too many pointers to quantities inside the buffer, or if you do, patch them up after reallocation. Similar to the way you'd treat a buffer that you were going to pass into realloc().
Even the documentation for MapViewOfFileEx() explicitly suggests this:
While it is possible to specify an address that is safe now (not used by the operating system), there is no guarantee that the address will remain safe over time. Therefore, it is better to let the operating system choose the address. In this case, you would not store pointers in the memory mapped file, you would store offsets from the base of the file mapping so that the mapping can be used at any address.
Update from your comments
In that case, I suppose you could:
Not map into contiguous blocks. Perhaps you could map in chunks and write some intermediate function to decide which to read from/write to?
Try porting to 64 bit.
As the earlier post suggests, you can suspend every thread in the process while you change the memory mappings. You can use SuspendThread()/ResumeThread() for that. This has the disadvantage that your code has to know about all the other threads and hold thread handles for them.
An alternative is to use the Windows debug API to suspend all threads. If a process has a debugger attached, then every time the process faults, Windows will suspend all of the process's threads until the debugger handles the fault and resumes the process.
Also see this question which is very similar, but phrased differently:
Replacing memory mappings atomically on Windows